Created
April 28, 2016 14:05
-
-
Save hermes-pimentel/635c7ace64590e6b3dc5a0df2b301246 to your computer and use it in GitHub Desktop.
Download all AWS whitepapers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wget -O w1.txt http://aws.amazon.com/whitepapers/ && for i in `awk -F'"' '$0=$2' w1.txt | grep pdf | grep -v http`; do wget http:$i ; done |
@austincloudguru.. I think you missed the backticks or '$()' around the the grep... this works for me..
wget -O w1.txt http://aws.amazon.com/whitepapers/ && for i in $(grep -o //[^[:space:]]*.pdf w1.txt|grep whitepaper|sed -e 's/ /%20/g'); do wget http:$i ; done
Excellent sir :) 👍
@mpursley works for me.
wget -O w1.txt http://aws.amazon.com/whitepapers/ -H '--no-check-certificate' && for i in $(grep -o //[^[:space:]]*.pdf w1.txt|grep whitepaper|sed -e 's/ /%20/g'); do wget http:$i ; done
no longer possible to wget/curl as PDF links are lazy loaded in via JavaSccript now
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks, this was extremely helpful. Some of them now have spaces in the filenames as well as some links that have target before href, so they are missed as they don't awk right, so maybe something like:
wget -O w1.txt http://aws.amazon.com/whitepapers/ && for i in
grep -o //[^[:space:]]*.pdf w1.txt|grep whitepaper|sed -e 's/ /%20/g'; do wget http:$i ; done