Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save hermes-pimentel/635c7ace64590e6b3dc5a0df2b301246 to your computer and use it in GitHub Desktop.
Save hermes-pimentel/635c7ace64590e6b3dc5a0df2b301246 to your computer and use it in GitHub Desktop.
Download all AWS whitepapers
wget -O w1.txt http://aws.amazon.com/whitepapers/ && for i in `awk -F'"' '$0=$2' w1.txt | grep pdf | grep -v http`; do wget http:$i ; done
@austincloudguru
Copy link

austincloudguru commented Feb 2, 2017

Thanks, this was extremely helpful. Some of them now have spaces in the filenames as well as some links that have target before href, so they are missed as they don't awk right, so maybe something like:wget -O w1.txt http://aws.amazon.com/whitepapers/ && for i in grep -o //[^[:space:]]*.pdf w1.txt|grep whitepaper|sed -e 's/ /%20/g'; do wget http:$i ; done

@mpursley
Copy link

mpursley commented Sep 5, 2018

@austincloudguru.. I think you missed the backticks or '$()' around the the grep... this works for me..

wget -O w1.txt http://aws.amazon.com/whitepapers/ && for i in $(grep -o //[^[:space:]]*.pdf w1.txt|grep whitepaper|sed -e 's/ /%20/g'); do wget http:$i ; done

@mulatinho
Copy link

Excellent sir :) 👍

@wang1209
Copy link

@mpursley works for me.
wget -O w1.txt http://aws.amazon.com/whitepapers/ -H '--no-check-certificate' && for i in $(grep -o //[^[:space:]]*.pdf w1.txt|grep whitepaper|sed -e 's/ /%20/g'); do wget http:$i ; done

@chrisdlangton
Copy link

no longer possible to wget/curl as PDF links are lazy loaded in via JavaSccript now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment