Skip to content

Instantly share code, notes, and snippets.

@wanghaisheng
Forked from aowongster/alexa.js
Created June 28, 2024 05:14
Show Gist options
  • Save wanghaisheng/bc2483b3fc02ec0fe0f884443d18deb5 to your computer and use it in GitHub Desktop.
Save wanghaisheng/bc2483b3fc02ec0fe0f884443d18deb5 to your computer and use it in GitHub Desktop.
Getting the Alexa top 1 million sites directly from the server, unzipping it, parsing the csv and getting each line as an array. Now with majestic
var request = require('request');
var unzip = require('unzip');
var csv2 = require('csv2');
const alexa = 'http://s3.amazonaws.com/alexa-static/top-1m.csv.zip'
const majestic = 'http://downloads.majesticseo.com/majestic_million.csv'
const sources = [majestic];
const zSources = [alexa];
source.forEach(getFile);
zSources.forEach(getZFile);
const getFile = (url) => {
request.get(url)
.on('entry', function (entry) {
entry.pipe(csv2()).on('data', console.log);
});
}
const getZFile = (url) => {
request.get(url)
.pipe(unzip.Parse())
.on('entry', function (entry) {
entry.pipe(csv2()).on('data', console.log);
});
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment