Created
December 5, 2016 03:34
-
-
Save darkarnium/ab1b4b2bd276546e23d0032f6bdc2bc5 to your computer and use it in GitHub Desktop.
Fetch Alexa 'Top 1,000,000' site list and munge into a list of domains only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
ALEXA_STATIC_1M="http://s3.amazonaws.com/alexa-static/top-1m.csv.zip" | |
echo 'Attempting to fetch Alexa Top 1M archive...' | |
curl -o top-1m.csv.zip -s $ALEXA_STATIC_1M | |
if [ $? -ne 0 ]; then | |
echo 'FAILED: Count not fetch file from remote server.' | |
exit -1 | |
fi | |
echo 'Attempting to extract archive...' | |
unzip top-1m.csv.zip | |
if [ $? -ne 0 ]; then | |
echo 'FAILED: Could not extract CSV file from archive.' | |
exit -1 | |
fi | |
echo 'Attempting to prepare list of domains...' | |
cut -d ',' -f 2 top-1m.csv > top-1m.txt | |
if [ $? -ne 0 ]; then | |
echo 'FAILED: Could not extract domains from list.' | |
exit -1 | |
fi |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment