Skip to content

Instantly share code, notes, and snippets.

@chilts
Created October 30, 2013 09:27
Show Gist options
  • Save chilts/7229605 to your computer and use it in GitHub Desktop.
Save chilts/7229605 to your computer and use it in GitHub Desktop.
Getting the Alexa top 1 million sites directly from the server, unzipping it, parsing the csv and getting each line as an array.
var request = require('request');
var unzip = require('unzip');
var csv2 = require('csv2');
request.get('http://s3.amazonaws.com/alexa-static/top-1m.csv.zip')
.pipe(unzip.Parse())
.on('entry', function (entry) {
entry.pipe(csv2()).on('data', console.log);
})
;
@ciscospirit
Copy link

Hello,
does anyone knows how to get the top-1000 from a specific Country too?
i would search for the Austrian and Germany Top 1000 List. Can anybody help me out with a link to download?

@chilts
Copy link
Author

chilts commented May 17, 2022

@ciscospirit I don't know any off the top of my head, but perhaps do a search and see what you can find.

@chilts
Copy link
Author

chilts commented May 17, 2022

Hi everyone, I just noticed this site on a fork of this gist and also seems to be kept up to date:

I don't know if it's useful to anyone, but there we go. :)

@ramazansancar
Copy link

@kostasmaneadis
Copy link

kostasmaneadis commented May 17, 2023

Hey everyone, when I download http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip , the csv has ".deprecated" as file extension. This is it ? Its done ?

@skacurt
Copy link

skacurt commented May 17, 2023

@kostasmaneadis Yes, it's no more.

-----------------------------------------------------------------
Notice: This file is deprecated and is not being updated anymore.
        This file was last updated on February 1, 2023.
        This file will not be available from
        http://s3.amazonaws.com/alexa-static/top-1m.csv.zip after
        July 31, 2023.
-----------------------------------------------------------------

@ggmartins
Copy link

https://radar.cloudflare.com/domains
top 1000000 unordered 🤢

@securitybd
Copy link

I am in trouble with my new domain securelines.net to install WordPress,

@d668
Copy link

d668 commented May 31, 2024

I get access denied when accessing http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

@kostasmaneadis
Copy link

kostasmaneadis commented May 31, 2024

@evilpie
Copy link

evilpie commented Jul 28, 2024

This is a link to the Cisco Umbrella popularity list. Archive.org has luckily archived the zip: https://web.archive.org/web/20230401000000*/https://s3.amazonaws.com/alexa-static/top-1m.csv.zip

@muserk1977
Copy link

You can check our website for this notebookdepo

@grayguest
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment