Created
November 6, 2015 04:50
-
-
Save adeekshith/2983ee208b67684340bb to your computer and use it in GitHub Desktop.
Applies cURL on every URL in the given file and saves by line number. Each line in the input file should be a URL
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
lineNum=1 | |
while IFS='' read -r line || [[ -n "$line" ]]; do | |
echo "Processing line $lineNum : $line" | |
curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" $line -o scraped-html/$lineNum.html | |
delayNow=$((RANDOM%10*30+RANDOM%10)) | |
echo "Waiting for $delayNow sec" | |
sleep $delayNow | |
lineNum=$((lineNum+1)) | |
done < "$1" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment