Skip to content

Instantly share code, notes, and snippets.

@ericwastaken
Created March 21, 2018 00:52
Show Gist options
  • Save ericwastaken/1a19f7371d164cd4b0d2504f22d9ee3c to your computer and use it in GitHub Desktop.
Save ericwastaken/1a19f7371d164cd4b0d2504f22d9ee3c to your computer and use it in GitHub Desktop.
given a passed URL, extracts all URIs and returns a list, sorted and unique.
#!/bin/sh
# *********************************************************************
# script: geturilist
# summary: given a passed URL, extracts all URIs and returns a list,
# sorted and unique.
# dependencies: wget; grep
# by: e.a.soto, [email protected]
#
# history
# =======
#
# 2015-09-16: created.
#
# *********************************************************************
if [ $# -eq 0 ]
then
echo "URL must be supplied!"
exit 1
fi
# wget the passed URI (-q = quiet; -O = output to file '-' which is stdout instead of file)
# pass to grep (-E = extended regex; -o = output only matches; -i = ignore case)
wget -qO- $1 | grep -Eoi '(http|https|ftp|ftps)://[^"]*' | sort -u
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment