Skip to content

Instantly share code, notes, and snippets.

View philshem's full-sized avatar
💭
🐙

@philshem philshem

💭
🐙
View GitHub Profile
@philshem
philshem / cleanlist.txt
Last active August 29, 2015 13:55
Scrape the count of google search results (which are very approximate). Maybe require tweaking based on your browser language, etc.
able,academic,addiction,afraid,agricultural,analog,analogue,architectural,art,artistic,assistant,associate,audio,bad,bank,beauty,beauty ,benefits,best,birth,brave,business,busy,campaign,care,career,careers,careful,cheap,chief,clean,clever,client,clinical,co,comfortable,communications,competent,compliance,confidential,congressional,consumer,content,contigencies,core,course,court,customer,dangerous,database,deputy,difficult,digital,dirty,district,doctoral,dramatic,early,economic,education,ejaculation,emotional intelligence,employment ,empty,enrollment,enrolment,environmental,equal opportunity,exciting,executive,expensive,expert,external,faculty,fair,family,famous,fashion,fast,favorite,favourite,fifth,finance,financial,fine,first,food,fourth,free,full,funny,gastronomic,general,goal,good,google,graduate,great,green building,hairstyle,happy,health,home,important,industrial,information,insurance,interesting,internal,investment,jewellry,jewelry,job,junior,kind,language,late,law,lay,lazy,learning,learning development
@philshem
philshem / get_wiki_pv.py
Last active August 29, 2015 13:56
Collect daily Wikipedia page view counts for an array of terms. In this case, it's 'Advisor' and 'Adviser'. It helps to check that the Wikipedia page exists, first.
import requests
import collections
import time
searchlist = ['Advisor','Adviser']
minyear = 2008
maxyear = 2014
for search in searchlist:
views = {}
@philshem
philshem / wunderground_current.py
Last active November 15, 2018 09:17
Wunderground API: Streamlined python 2.7 code using the requests package. Must register at http://www.wunderground.com/weather/api/ and insert your personal key.
import requests
data = requests.get('http://api.wunderground.com/api/INSERT_KEY_HERE/geolookup/conditions/q/Switzerland/Zurich.json').json()
location = data['location']['city']
temp_c = data['current_observation']['temp_c']
print "Current temperature in %s is: %s C" % (location, temp_c)
@philshem
philshem / email_count.py
Last active May 2, 2017 04:36
Scans a .mbox email file and reports back the frequency of words.
# scans a .mbox email file and reports back the frequency of words
import mailbox
import re
from multiprocessing import Pool
mbox = mailbox.mbox('sample.maxima.mbox')
def main():
@philshem
philshem / twitter_search.py
Last active August 29, 2015 13:56
Search twitter via the API and download all corresponding tweets, for later analysis.
import json
import twitter # https://github.com/bear/python-twitter
import time
def main():
api = twitter.Api(consumer_key='INSERT', \
consumer_secret='INSERT', \
access_token_key='INSERT', \
access_token_secret='INSERT')
@philshem
philshem / get_archive.py
Created March 12, 2014 10:58
Wayback Machine: finds historical New York Times front pages that include a certain string, and prints a link to those archived pages.
import requests
import json
from bs4 import BeautifulSoup
site = 'nytimes.com'
for year in xrange(2010,2014+1):
for month in xrange(1,12+1):
url = 'http://archive.org/wayback/available?url='+site+'&timestamp='+str(year)+str(month).zfill(2)+str('01')
r = requests.get(url)
data = json.loads(r.text)
@philshem
philshem / get_latlong.py
Created April 7, 2014 09:45
Get latitude, longitude and other data from Google Maps geolocation API
import requests
import json
urlbase = 'http://maps.googleapis.com/maps/api/geocode/json?sensor=false&address='
urlend = 'Zurich,Switzerland'
r = requests.get(urlbase+urlend) # request to google maps api
r=r.json()
if r.get('results'):
@philshem
philshem / get_amazon_page.py
Last active September 23, 2022 17:38
Scrape the number of pages in a book from Amazon.com
# Add links to urllist for more pages.
# Code can be expanded to scrape more.
import requests
from bs4 import BeautifulSoup
urllist = [
'http://www.amazon.com/Flash-Boys-Wall-Street-Revolt/dp/0393244660',
'http://www.amazon.com/The-Big-Short-Doomsday-Machine/dp/0393338827'
]
@philshem
philshem / create_csv_unicode.py
Last active August 29, 2015 13:58
Code to create a CSV file with unicode lookup and HTML escapes.
import sys
with open('unicode.csv','wb') as output:
for i in xrange(sys.maxunicode):
output.write(unicode(i))
output.write(u',')
output.write(unichr(i).encode('utf-8'))
output.write(u',')
output.write(unichr(i).encode('ascii', 'xmlcharrefreplace'))
output.write(u'\n')
print sys.maxunicode
# encoding: utf-8
import os
import shelve
import boto.glacier
import boto
from boto.glacier.exceptions import UnexpectedHTTPResponseError
ACCESS_KEY_ID = "XXXXXXXXXXXXX"
SECRET_ACCESS_KEY = "XXXXXXXXXXX"
SHELVE_FILE = os.path.expanduser("~/.glaciervault.db")