davemac · April 4, 2023 04:32 · andy-shev · Mar 16, 2021 · davemac · Mar 16, 2021
diff --git a/getlinks b/getlinks
 get from a remote file to STDOUT:

 wget -qO- https://www.membrane-australasia.org/gallery/imstec-2016-adelaide-part-3/ | 
 grep -Eoi '<a [^>]+>' | grep -Eo 'href="[^\"]+"' | 
 grep -Eo '(http|https)://[a-zA-Z0-9./?=_-]*' |
 uniq

 get from a remote file to xargs and download each URL

 wget -qO- https://www.membrane-australasia.org/gallery/imstec-2016-adelaide-part-1/ | grep -Eoi '<a [^>]+>' | grep -Eo 'href="[^\"]+"' | grep -Eo '(http|https)://[a-zA-Z0-9./?=_-]*' | uniq | xargs -n 1 -P 24 curl -LO

 local file version:

 grep -Eoi '<a [^>]+>' file.htm | grep -Eo 'href="[^\"]+"' | grep -Eo '(http|https)://[a-zA-Z0-9./?=_-]*' | uniq

 Sources
 https://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file
	get from a remote file to STDOUT:

	wget -qO- https://www.membrane-australasia.org/gallery/imstec-2016-adelaide-part-3/ \|
	grep -Eoi '<a [^>]+>' \| grep -Eo 'href="[^\"]+"' \|
	grep -Eo '(http\|https)://[a-zA-Z0-9./?=_-]*' \|
	uniq

	get from a remote file to xargs and download each URL

	wget -qO- https://www.membrane-australasia.org/gallery/imstec-2016-adelaide-part-1/ \| grep -Eoi '<a [^>]+>' \| grep -Eo 'href="[^\"]+"' \| grep -Eo '(http\|https)://[a-zA-Z0-9./?=_-]*' \| uniq \| xargs -n 1 -P 24 curl -LO

	local file version:

	grep -Eoi '<a [^>]+>' file.htm \| grep -Eo 'href="[^\"]+"' \| grep -Eo '(http\|https)://[a-zA-Z0-9./?=_-]*' \| uniq

	Sources
	https://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file