lukerosiak’s gists

lukerosiak / CatoApprops.py

Last active January 2, 2016 21:19

Parse appropriations from Cato XML

	import os
	import json
	import re
	import csv

	from bs4 import BeautifulSoup


	#format numbers
	def commafy(x):

lukerosiak / embedsenate2014

Last active December 19, 2015 06:19

Embed Washington Examiner 2014 Senate battleground graphic

	PREFERRED EMBED CODE (widget):

	<div id="2014senatetossupsmap_hype_container" style="position:relative;overflow:hidden;width:600px;height:500px;">
	<script type="text/javascript" charset="utf-8" src="http://s3.amazonaws.com/examiner/2014battleground/2014+SENATE+tossups+map.hyperesources/2014senatetossupsmap_hype_generated_script.js?70501"></script>
	</div>

	NON-PREFERRED EMBED CODE IF THAT DOESN'T WORK (iframe)

	<iframe src="http://s3.amazonaws.com/examiner/2014battleground/map.html" width="620" height="500" scrolling="no" frameborder="no"/>

lukerosiak / mirror990s.py

Created June 6, 2013 19:52

Download text files representing OCR'd images of IRS Form 990s, corresponding to the URL scheme at bulk.resource.org/irs.gov/eo. Metadata for each file, such as name of nonprofit, IRS EIN, and year, is available in the "manifest" files there. A parser for those is available at github.com/lukerosiak/irs/.

	import os
	import boto

	"""
	Mirror the entire nonprofittext S3 bucket, downloading only files that aren't already present or if the S3 version is larger than the one we have.

	The only dependency is boto. To install: pip install boto

	To run: python download.py

lukerosiak / mlb.py

Last active April 2, 2018 00:39

Parse MLB transactions and injuries into a spreadsheet.

	#This web site lists recent injuries for MLB players in HTML format, but requires you to click each team, etc.
	#http://mlb.mlb.com/mlb/fantasy/injuries/
	#To do real data analysis, we want a shitton and don't have time to click everywhere.
	#So our exercise is to get all available injuries into one easy to use spreadsheet.

	#By looking at "view source" on the web site, I found that the web site actually hits another web site, which provides the injuries, trades and other info in a computer-readable format called JSON, which is basically
	#the same as python's dictionary type. You can only get one month at a time bc there are so many. See it here:
	#http://mlb.mlb.com/lookup/json/named.transaction_all.bam?start_date=20120301&end_date=20120401&sport_code='mlb'
	#Our code will hit this web site repeatedly for different dates, convert the web site's content into a python object, and then write certain fields from that object to a file of the CSV format

lukerosiak / gist:5328017

Created April 6, 2013 22:59

census.ire.org to PostgreSQL

	#import all Census 2010 tables into PostgreSQL. Then use BoundaryService to import TIGER shapefiles into PostGIS and join them.

	import os

	s = """ire_H1.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
	ire_H10.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
	ire_H11.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
	ire_H11A.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
	ire_H11B.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
	ire_H11C.sql a year ago create table DDL for bulkdata export format [JoeGermuska]

lukerosiak / OLMS

Created February 5, 2013 20:00

Import OLMS into Postgres

	import os
	import psycopg2

	#IMPORT AND UNZIL ALL YEARS OF FILES INTO THIS DIRECTORY
	#SET VARIABLES IN THE NEXT 3 LINES
	path = '/media/sf_bulk/labor/data/'
	years = [str(x) for x in range(2000,2013)]

	conn = psycopg2.connect(database="labor", user="", password="")

lukerosiak / diff-different-lines-only.txt

Created October 7, 2011 14:38

Diff between old disbursement detail output and new

	OFFICE OF THE MINORITY WHIP,2009Q4,PERSONNEL COMPENSATION,,"D
	\| OFFICE OF THE MINORITY WHIP,2009Q4,PERSONNEL COMPENSATION,,"D

	CAO OPERATIONS MANAGEMENT,2009Q4,TRAVEL,12-03,DARRYL A ATCHIS
	\| CAO OPERATIONS MANAGEMENT,2009Q4,TRAVEL,12-03,DOUGLAS MASSENG

	COMMUNICATIONS,2009Q4,SUPPLIES AND MATERIALS,12-17,,,,FRAMING
	\| COMMUNICATIONS,2009Q4,SUPPLIES AND MATERIALS,12-17,FRAMING,,,

	COMMUNICATIONS,2009Q4,SUPPLIES AND MATERIALS,12-17,,,,FRAMING

lukerosiak / strip.py

Created October 7, 2011 05:48

Get rid of fluff on fields in a CSV

	"""
	Ensure the new and old fields uses the same CSV quoting conventions and format decimals the same way (15.00 vs 15 and 16.10 vs 16.1), so we can run a diff without being distracted those differences.
	"""


	import csv

	fin = csv.reader(open('../../archives/3_csv_original/2011Q3-summary-sunlight.csv','r'))
	fout = csv.writer(open('../../archives/3_csv_original/2011Q3-summary-sunlight-stripped.csv','w'))

lukerosiak / flattenfactfinder.py

Created September 20, 2011 23:10

Put columns for multiple geographies of ACS 2010 comparison profiles into more usable file, including combining with annotated file.

	import csv

	fout = csv.writer( open('cpflat.csv','wU') )

	def process(i):
	fin = csv.reader( open('ACS_10_1YR_CP0%s.csv' % i,'r') )
	fin_ann = csv.reader( open('ACS_10_1YR_CP0%s_ann.csv' % i,'r') )

	fin.next()
	headers = fin.next()[3:]