Created
June 10, 2016 11:23
-
-
Save zuphilip/4d756f4d509dfe92ea889e0d0e4b6229 to your computer and use it in GitHub Desktop.
Analysis of urls referenced in enwiki
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
240517 http://books.google.com | |
148920 https://books.google.com | |
143683 http://news.bbc.co.uk | |
104078 http://www.nytimes.com | |
100249 http://www.census.gov | |
85375 http://www.bbc.co.uk | |
62074 http://factfinder2.census.gov | |
51794 http://www.stat.gov.pl | |
47973 http://www.guardian.co.uk | |
43483 http://news.google.com | |
40215 http://www.billboard.com | |
39750 http://www.allmusic.com | |
38465 http://www.baseball-reference.com | |
38318 http://www.telegraph.co.uk | |
28273 http://query.nytimes.com | |
28050 http://www.washingtonpost.com | |
27916 http://www.imdb.com | |
26415 http://www.independent.co.uk | |
25375 http://www.theguardian.com | |
25088 https://news.google.com | |
23352 http://articles.latimes.com | |
22174 http://geonames.usgs.gov | |
21098 http://books.google.co.uk | |
20822 http://www.youtube.com | |
20467 http://www.ncbi.nlm.nih.gov | |
20171 http://www.amazon.com | |
18984 http://www.dailymail.co.uk | |
18178 http://www.mtv.com | |
17789 https://archive.org | |
17718 http://www.usatoday.com | |
17249 https://www.youtube.com | |
16969 http://www.cbc.ca | |
16967 http://www.abc.net.au | |
16642 http://espn.go.com | |
16615 http://sports.espn.go.com | |
16437 http://www.soccerbase.com | |
16359 http://www.reuters.com | |
16223 http://www.time.com | |
16059 http://www.cricketarchive.com | |
15399 http://www.smh.com.au | |
15383 http://www.metacritic.com | |
14971 http://www.rollingstone.com | |
14856 http://www.discogs.com | |
14732 http://www.archive.org | |
14471 http://www.portal.state.pa.us | |
14370 http://www.sports-reference.com | |
13950 http://pqasb.pqarchiver.com | |
13805 http://nla.gov.au | |
13754 http://www.huffingtonpost.com | |
13737 http://www.animenewsnetwork.com | |
13603 http://www.highbeam.com | |
13432 http://tvbythenumbers.zap2it.com | |
13154 http://www.gamespot.com | |
13048 http://www.digitalspy.co.uk | |
12259 http://www.cnn.com | |
12084 http://www.espncricinfo.com | |
12037 http://www.wwe.com | |
11453 http://www.hollywoodreporter.com | |
11113 http://www.forbes.com | |
11102 http://www.thehindu.com | |
11029 http://www.rsssf.com | |
10927 http://www.hindu.com | |
10917 http://select.nytimes.com | |
10910 http://www.ew.com | |
10760 http://books.google.ca | |
10371 https://itunes.apple.com | |
10256 http://www.variety.com | |
10228 http://pwtorch.com | |
10226 http://www.bloomberg.com | |
10217 http://timesofindia.indiatimes.com | |
10193 http://www.nba.com | |
10156 http://factfinder.census.gov | |
9888 http://www.latimes.com | |
9784 http://articles.timesofindia.indiatimes.com | |
9759 http://www.sfgate.com | |
9485 http://www.theage.com.au | |
9468 http://www.nzherald.co.nz | |
9362 http://www.boston.com | |
9360 http://www.ign.com | |
9358 http://www.uefa.com | |
9175 http://online.wsj.com | |
9112 http://www.rte.ie | |
9000 http://www.collectionscanada.gc.ca | |
8988 http://www.pro-football-reference.com | |
8657 http://www.timesonline.co.uk | |
8605 http://www.basketball-reference.com | |
8539 http://www.allmovie.com | |
8498 http://nl.newsbank.com | |
8453 http://www.nydailynews.com | |
8436 http://www.bizjournals.com | |
8328 http://www.independent.ie | |
8197 http://slam.canoe.ca | |
8168 http://www.officialcharts.com | |
8102 http://www.jstor.org | |
8070 http://www.npr.org | |
8054 http://www.nps.gov | |
7941 http://www.rottentomatoes.com | |
7794 http://www.nhc.noaa.gov | |
7700 http://www.fifa.com | |
7659 http://www.flightglobal.com |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# download and unpack the tsv file from | |
# https://zenodo.org/record/55004# | |
perl -wnE 'say $1 if /(https?:\/\/[^\/"]+)/' enwiki_2016-06-01_CS1_citations.tsv > enwiki-baseurls.txt | |
sort enwiki-baseurls.txt | uniq -c | sort -n -r > enwiki-output.txt |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment