Last active
August 29, 2015 14:10
-
-
Save philshem/6ff513ebed7b1972ded5 to your computer and use it in GitHub Desktop.
Collect traffic stats to compare Stack Exchange network of sites.
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| site users visits_per_day answers answered site_age_months questions questions_per_day | |
| Puzzling 4100.0 2400.0 2900.0 98% 7 1000.0 10.0 | |
| Expatriates 1700.0 968.0 1100.0 92% 9 761.0 1.8 | |
| Personal Finance & Money 15000.0 11000.0 18000.0 96% 52 8800.0 8.8 | |
| LEGO® Answers 2900.0 2300.0 2100.0 97% 38 1100.0 1.6 | |
| Bitcoin 14000.0 4900.0 13000.0 88% 40 8000.0 6.1 | |
| Pets 2100.0 4600.0 2500.0 93% 14 1600.0 3.7 | |
| Genealogy & Family History 1400.0 520.0 2100.0 97% 26 922.0 0.9 | |
| Italian Language 982.0 545.0 955.0 99% 13 495.0 1.5 | |
| Space Exploration 3600.0 2400.0 3000.0 93% 17 1900.0 3.6 | |
| Open Data 3900.0 778.0 1900.0 91% 19 905.0 2.4 | |
| Photography 22000.0 24000.0 32000.0 98% 53 12000.0 8.2 | |
| SharePoint 23000.0 43000.0 56000.0 70% 44 44000.0 48.0 | |
| Network Engineering 7800.0 8000.0 4600.0 87% 19 2900.0 8.7 | |
| Ask Different 76000.0 226000.0 76000.0 72% 52 48000.0 53.0 | |
| Raspberry Pi 16000.0 15000.0 8500.0 76% 30 5900.0 7.6 | |
| Computer Science 19000.0 5800.0 11000.0 87% 33 8200.0 18.0 | |
| Blender 5800.0 7300.0 7000.0 90% 19 5500.0 16.0 | |
| Ask Patents 6600.0 994.0 2800.0 76% 27 1500.0 2.2 | |
| Tridion 1200.0 804.0 4500.0 96% 22 2600.0 2.7 | |
| Anime & Manga 5600.0 10000.0 5500.0 87% 24 4000.0 7.4 | |
| Spanish Language 3800.0 4800.0 4800.0 100% 37 1900.0 3.4 | |
| Joomla 1300.0 1600.0 1900.0 94% 8 1200.0 5.5 | |
| Programming Puzzles & Code Golf 25000.0 3800.0 26000.0 98% 47 2800.0 3.4 | |
| Music: Practice & Theory 9600.0 13000.0 13000.0 99% 44 4600.0 5.4 | |
| Drupal Answers 26000.0 25000.0 57000.0 69% 45 45000.0 41.0 | |
| Robotics 4400.0 1300.0 2200.0 86% 26 1300.0 1.9 | |
| Magento 11000.0 21000.0 18000.0 73% 23 15000.0 50.0 | |
| Electrical Engineering 40000.0 53000.0 70000.0 90% 51 37000.0 58.0 | |
| Unix & Linux 79000.0 168000.0 87000.0 82% 52 54000.0 67.0 | |
| Super User 292000.0 546000.0 391000.0 71% 65 247000.0 173.0 | |
| Web Applications 49000.0 38000.0 24000.0 80% 54 16000.0 12.0 | |
| Stack Overflow 3700000.0 6900000.0 14000000.0 74% 77 8500000.0 7500.0 | |
| Sound Design 4800.0 4000.0 20000.0 95% 49 6100.0 2.6 | |
| Science Fiction & Fantasy 27000.0 41000.0 37000.0 95% 47 17000.0 17.0 | |
| Computational Science 7500.0 2200.0 5500.0 84% 37 3500.0 4.7 | |
| Software Recommendations 7000.0 3400.0 3900.0 57% 10 3700.0 11.0 | |
| Christianity 8500.0 17000.0 16000.0 99% 40 6200.0 7.0 | |
| The Workplace 20000.0 18000.0 19000.0 100% 32 5600.0 11.0 | |
| Bicycles 11000.0 11000.0 15000.0 98% 52 5300.0 4.1 | |
| Role-playing Games 11000.0 11000.0 29000.0 100% 52 10000.0 9.5 | |
| Sustainable Living 1700.0 1000.0 1300.0 98% 23 626.0 0.6 | |
| Biology 7400.0 7000.0 8600.0 87% 36 6700.0 9.6 | |
| Home Improvement 19000.0 60000.0 25000.0 89% 53 14000.0 16.0 | |
| Cryptography 13000.0 3800.0 7900.0 90% 41 5500.0 9.3 | |
| Board & Card Games 7300.0 8200.0 9700.0 98% 50 4600.0 4.4 | |
| Meta Stack Exchange 118000.0 7200.0 109000.0 88% 66 69000.0 17.0 | |
| Japanese Language 5200.0 2800.0 8900.0 99% 43 5300.0 4.8 | |
| Aviation 4200.0 4900.0 4100.0 99% 12 1900.0 3.1 | |
| Astronomy 2900.0 648.0 2100.0 95% 15 1400.0 3.3 | |
| Video Production 5900.0 2300.0 3000.0 91% 48 2000.0 2.9 | |
| Economics 325.0 170.0 264.0 83% 1 156.0 7.4 | |
| English Language Learners 11000.0 20000.0 21000.0 98% 23 12000.0 28.0 | |
| ExpressionEngine® Answers 3400.0 1400.0 11000.0 80% 25 8900.0 7.7 | |
| Chemistry 6300.0 11000.0 7500.0 91% 32 5600.0 17.0 | |
| Project Management 9500.0 2700.0 8000.0 99% 46 2200.0 2.3 | |
| Freelancing 4300.0 592.0 1600.0 98% 19 600.0 0.7 | |
| Quantitative Finance 7400.0 2800.0 5900.0 79% 47 3700.0 6.5 | |
| Webmasters 33000.0 11000.0 32000.0 97% 53 19000.0 13.0 | |
| Server Fault 188000.0 280000.0 337000.0 82% 68 186000.0 99.0 | |
| Arqade 57000.0 274000.0 85000.0 92% 53 50000.0 44.0 | |
| Movies & TV 10000.0 21000.0 10000.0 93% 37 6500.0 10.0 | |
| Chinese Language 4600.0 1200.0 5700.0 99% 36 2100.0 4.6 | |
| MathOverflow 39000.0 19000.0 94000.0 80% 63 56000.0 40.0 | |
| Personal Productivity 9300.0 1300.0 6500.0 100% 42 1700.0 1.6 | |
| Ebooks 1900.0 1000.0 787.0 89% 12 464.0 0.7 | |
| Linguistics 4200.0 1700.0 4000.0 81% 39 2500.0 3.6 | |
| Philosophy 8300.0 2400.0 8600.0 93% 42 3500.0 3.8 | |
| Motor Vehicle Maintenance & Repair 5900.0 19000.0 7000.0 89% 45 4000.0 3.7 | |
| Amateur Radio 1800.0 510.0 1200.0 98% 14 713.0 1.1 | |
| Tor 2900.0 2500.0 1800.0 78% 15 1400.0 2.6 | |
| French Language 4800.0 5000.0 5900.0 100% 40 2500.0 2.5 | |
| Travel 15000.0 17000.0 17000.0 99% 42 9400.0 14.0 | |
| Buddhism 1400.0 466.0 2500.0 98% 6 895.0 2.1 | |
| Emacs 1800.0 800.0 1800.0 91% 3 1000.0 11.0 | |
| Skeptics 16000.0 14000.0 6800.0 85% 46 5300.0 4.0 | |
| Signal Processing 8200.0 4800.0 7400.0 77% 40 5700.0 8.5 | |
| Theoretical Computer Science 19000.0 2200.0 11000.0 78% 52 6400.0 4.9 | |
| Software Quality Assurance & Testing 7700.0 4900.0 5800.0 87% 43 2400.0 4.6 | |
| Cognitive Sciences 4700.0 1400.0 3000.0 79% 35 2400.0 3.5 | |
| Android Enthusiasts 59000.0 150000.0 33000.0 65% 51 26000.0 42.0 | |
| Homebrewing 4500.0 4700.0 8100.0 99% 49 3400.0 1.6 | |
| The Great Outdoors 3100.0 2500.0 3900.0 99% 35 1600.0 1.7 | |
| Arduino 4000.0 3300.0 2200.0 74% 10 1600.0 7.6 | |
| Code Review 51000.0 34000.0 34000.0 95% 47 20000.0 31.0 | |
| Information Security 49000.0 40000.0 39000.0 95% 49 18000.0 25.0 | |
| Stack Apps 19000.0 424.0 2000.0 64% 57 1800.0 1.2 | |
| Community Building 753.0 10.0 589.0 100% 5 215.0 0.5 | |
| History of Science and Math 455.0 82.0 308.0 88% 2 195.0 1.5 | |
| Data Science 3200.0 402.0 892.0 76% 7 505.0 2.6 | |
| Graphic Design 25000.0 44000.0 20000.0 91% 47 10000.0 16.0 | |
| Database Administrators 43000.0 76000.0 39000.0 82% 47 28000.0 36.0 | |
| Academia 19000.0 12000.0 17000.0 97% 34 6800.0 17.0 | |
| Sports 3200.0 3800.0 3000.0 93% 34 1800.0 2.6 | |
| History 5900.0 4900.0 6600.0 94% 38 3400.0 4.6 | |
| Russian Language 3200.0 974.0 2900.0 100% 30 1000.0 1.1 | |
| Game Development 43000.0 21000.0 42000.0 91% 53 23000.0 20.0 | |
| User Experience 44000.0 11000.0 41000.0 97% 52 14000.0 11.0 | |
| Beer 1500.0 740.0 675.0 97% 11 320.0 0.2 | |
| Physical Fitness 9100.0 7100.0 9200.0 96% 45 4300.0 3.1 | |
| Mi Yodeya 4100.0 2900.0 23000.0 87% 43 13000.0 14.0 | |
| Startups 1700.0 135.0 905.0 93% 5 440.0 2.3 | |
| Martial Arts 2200.0 1200.0 2400.0 98% 35 666.0 0.6 | |
| Mathematics Educators 2400.0 445.0 2800.0 96% 9 796.0 1.5 | |
| Windows Phone 4200.0 5000.0 2300.0 86% 32 1600.0 1.6 | |
| Mathematica 16000.0 9300.0 32000.0 89% 35 20000.0 34.0 | |
| Writers 7900.0 3500.0 8100.0 99% 49 2700.0 2.6 | |
| Chess 4200.0 2200.0 3600.0 98% 31 1500.0 2.7 | |
| Hinduism 1100.0 1500.0 1400.0 80% 6 908.0 3.6 | |
| Mathematics 155000.0 106000.0 536000.0 79% 53 365000.0 601.0 | |
| Craft CMS 1100.0 736.0 2200.0 96% 6 1700.0 7.6 | |
| Earth Science 1200.0 744.0 944.0 89% 8 714.0 2.4 | |
| Programmers 128000.0 60000.0 125000.0 96% 51 34000.0 28.0 | |
| Parenting 7800.0 14000.0 9600.0 100% 45 2700.0 1.4 | |
| TeX - LaTeX 54000.0 64000.0 106000.0 93% 53 77000.0 70.0 | |
| Ask Ubuntu 236000.0 364000.0 225000.0 66% 53 175000.0 156.0 | |
| German Language 6700.0 6300.0 10000.0 100% 43 4100.0 8.2 | |
| Worldbuilding 2400.0 1700.0 2700.0 100% 3 625.0 5.7 | |
| Salesforce 9300.0 18000.0 25000.0 81% 29 19000.0 42.0 | |
| Gardening & Landscaping 4000.0 2400.0 6000.0 99% 42 3400.0 2.5 | |
| Seasoned Advice 20000.0 84000.0 29000.0 99% 53 11000.0 7.9 | |
| Biblical Hermeneutics 3300.0 4700.0 4500.0 97% 38 2200.0 2.1 | |
| Poker 1700.0 868.0 1400.0 99% 35 547.0 0.7 | |
| WordPress Development 44000.0 33000.0 66000.0 75% 52 51000.0 46.0 | |
| Physics 44000.0 46000.0 70000.0 85% 49 44000.0 71.0 | |
| Stack Overflow em Português 12000.0 11000.0 19000.0 93% 12 12000.0 54.0 | |
| English Language & Usage 71000.0 210000.0 119000.0 98% 52 45000.0 54.0 | |
| Reverse Engineering 4900.0 2200.0 2500.0 92% 21 1500.0 2.9 | |
| Cross Validated 45000.0 42000.0 51000.0 65% 53 47000.0 68.0 | |
| Politics 2800.0 1800.0 2300.0 87% 24 1400.0 2.1 | |
| Geographic Information Systems 31000.0 32000.0 55000.0 75% 53 43000.0 55.0 | |
| Islam 4800.0 6800.0 6000.0 84% 30 3400.0 3.6 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # -*- coding: utf-8 -*- | |
| # collect traffic data from the stackexchange sites page | |
| import requests | |
| from bs4 import BeautifulSoup | |
| from collections import defaultdict | |
| def main(): | |
| url = 'http://stackexchange.com/sites?view=list#traffic' | |
| r = requests.get(url) | |
| soup = BeautifulSoup(r.content) | |
| stats = soup.find_all(True, {"class" : ["lv-info","lv-stats-wrapper"]}) | |
| result = defaultdict(dict) | |
| for i in xrange(1,len(stats),2): | |
| site = stats[i-1].find('a').get_text(strip=True).encode('utf-8') | |
| #print site.encode('utf-8') | |
| data = stats[i].get_text(strip=False).split() | |
| # "total" data type | |
| result[site][u'questions'] = clean_totals(data[0]) | |
| result[site][u'answers'] = clean_totals(data[2]) | |
| result[site][u'questions_per_day'] = clean_totals(data[10]) | |
| result[site][u'users'] = clean_totals(data[6]) | |
| result[site][u'visits_per_day'] = clean_totals(data[8]) | |
| # year/month data type | |
| result[site][u'site_age_months'] = clean_date(data[12]) | |
| # percentage data type | |
| result[site][u'answered'] = data[4] | |
| for item in result: | |
| print '\t'.join([u'site']+result[item].keys()) # print csv header | |
| break | |
| for item in result: | |
| tmp_list = [unicode(x).encode('utf-8') for x in result[item].values()] | |
| print '\t'.join([item]+tmp_list) | |
| def clean_totals(value): | |
| if value[-1] == 'm': | |
| return float(value[:-1])*1000000 | |
| elif value[-1] == 'k': | |
| return float(value[:-1])*1000 | |
| else: | |
| return float(value) | |
| def clean_date(value): | |
| try: | |
| years = int(value.split('y')[0]) | |
| except: | |
| years = 0 | |
| if 'y' not in value: | |
| months = int(value.replace('m','')) | |
| years = 0 | |
| elif 'm' not in value: | |
| months = 0 | |
| years = int(value.replace('y','')) | |
| else: | |
| try: | |
| months = int(value.split('y')[1].split('m')[0]) | |
| except: | |
| months = 0 | |
| return years*12 + months | |
| if __name__ == "__main__": | |
| main() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment