Skip to content

Instantly share code, notes, and snippets.

@davemac
Last active April 4, 2023 04:32
Show Gist options
  • Save davemac/1eb4b231ab498e2e031bb2b8345a5df1 to your computer and use it in GitHub Desktop.
Save davemac/1eb4b231ab498e2e031bb2b8345a5df1 to your computer and use it in GitHub Desktop.
bash wget a remote URL, then extract the URLs from the anchor tags in that URL
@andy-shev
Copy link

According to that link your script is not correct. It might work in 98% cases, but it doesn't mean it can be reliable. Yup, that's it.
I would suggest to use ElementTree from Python or any other XML parser for this. Actually in the shell you may find nice XPath compatible parsers.

@davemac
Copy link
Author

davemac commented Mar 16, 2021

Thanks for the info, much appreciated. I'm curious, how did you come across my gist?

@Roy-Orbison
Copy link

Another easy way to get a list of URLs from a page is to use lynx: https://unix.stackexchange.com/a/684704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment