Skip to content

Instantly share code, notes, and snippets.

View philshem's full-sized avatar
💭
🐙

@philshem philshem

💭
🐙
View GitHub Profile
@philshem
philshem / stackexchange_tag_usage.csv
Created March 28, 2019 16:38
tag counts for all stackexchange network sites
We can't make this file beautiful and searchable because it's too large.
url,tagname,tagcount
"http://3dprinting.StackExchange.com/tags/101-hero|3dprinting","101-hero","1"
"http://3dprinting.StackExchange.com/tags/123d-catch|3dprinting","123d-catch","2"
"http://3dprinting.StackExchange.com/tags/2d|3dprinting","2d","4"
"http://3dprinting.StackExchange.com/tags/3d-design|3dprinting","3d-design","131"
"http://3dprinting.StackExchange.com/tags/3d-models|3dprinting","3d-models","152"
"http://3dprinting.StackExchange.com/tags/3d-pen|3dprinting","3d-pen","1"
"http://3dprinting.StackExchange.com/tags/3d-printerworks|3dprinting","3d-printerworks","1"
"http://3dprinting.StackExchange.com/tags/3dtouch|3dprinting","3dtouch","2"
"http://3dprinting.StackExchange.com/tags/abs|3dprinting","abs","66"
@philshem
philshem / follower_count_201904.csv
Last active April 7, 2019 13:55
twitter follower counts for swiss media (04.2019)
handle count
@20min 394346
@nzz 393044
@blickch 249295
@blickamabend 183293
@tagesanzeiger 178033
@watson_news 118861
@Lematinch 102817
@tdgch 88920
@24heuresch 66644
#!/usr/bin/env python
# coding=utf-8
ignore_list = ('Search-Navigation','Tools-What links','Top-','Contents','Magyar')
with open('AcronymsFile.csv','r') as inp:
data = inp.read().split('\n')
with open('clean_AcronymsFile.csv','w') as out:
out.write('acronym'+'\t'+'definition'+'\n')
@philshem
philshem / clean_AcronymFile.csv
Last active August 10, 2022 05:59
cleanup script and csv file (needs some cleaning) based on https://github.com/krishnakt031990/Crawl-Wiki-For-Acronyms
We can make this file beautiful and searchable if this error is corrected: It looks like row 7 should actually have 1 column, instead of 2 in line 6.
acronym definition
0D Zero-dimensional
1AM Air mechanic 1st class
1D One-dimensional
2AM Air mechanic 2nd class
2D Two-dimensional
2G Second-generation mobile (cellular, wireless) telephone system
2LA Two letter acronym
2Lt 2nd lieutenant
3AM Air mechanic 3rd class
@philshem
philshem / get_top500_favicons.py
Created April 16, 2019 16:27
Download top500 favicons from csv
import requests
import pandas as pd
import os
from io import StringIO
def request_function(domain):
domain = domain.replace('/','')
url = 'https://www.google.com/s2/favicons?domain=' + domain
fav = requests.get(url).content
with open('images'+os.sep+domain+'.png', 'wb') as handler:
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@philshem
philshem / cadima_clean_metadata.py
Last active May 21, 2019 09:24
Python3 script to clean non-ascii characters from the PDF "Title" metadata field.
# requires python3.x and one non-standard module `pip install pdfrw`
# pdfs should be in folder relative to this code, named `pdfs`
import os
from pdfrw import PdfReader, PdfWriter
from glob import glob
import unicodedata
def edit_title_metadata(inpdf):
@philshem
philshem / get_nobel_prize_years_until_death.py
Last active October 8, 2019 16:32
Nobel prize winners and their years until death.
#/usr/bin/python3
# gets demographics for nobel prize winners
# calculates yearly average of how many years between prize and death
import pandas as pd
import numpy as np
# api endpoint for all nobel winnters: https://nobelprize.readme.io/
url = 'http://api.nobelprize.org/v1/laureate.csv'
@philshem
philshem / swiss_housing_dataviz.ipynb
Created November 3, 2019 19:22
swiss_housing_dataviz.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@philshem
philshem / scrape_bee.py
Last active November 4, 2019 11:42
NYTimes Spelling Bee scraper 🐝☠️
#!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
import json
def main():
# the answers are stored as a json inside the page source
url = 'https://www.nytimes.com/puzzles/spelling-bee'