Zach Young zacharysyoung

Welcome to CSV w/Python!

CSV files are a good way to share tables of data, and Python's CSV module makes working with them straightforward.

This guide will quickly move you through all the concepts you need to fill in this basic CSV program:

# Open the CSV
# Read all the data
# Extract the header

Processing 30GB of text

Working out how to read-in 30GB worth of text without a line-breaking record separator... at least, there wasn't a newline in the short example:

28807644'~'0'~'Maun FCU'~'US#@#@#28855353'~'0'~'WNB Holdings LLC'~'US#@#@#29212330'~'0'~'Idaho First Bank'~'US#@#@#29278777'~'0'~'Republic Bank of Arizona'~'US#@#@#29633181'~'0'~'Friendly Hills Bank'~'US#@#@#29760145'~'0'~'The Freedom Bank of Virginia'~'US#@#@#100504846'~'0'~'Community First Fund Federal Credit Union'~'US#@#@#

To make this right:

replace field separator '~' with ,
replace record separator #@#@# with \n

Two ways to get all (sub)pages

You load a "base" URL, and if there are more results than 100, you load subsequent pages till you have ALL results.

You can accomplish this a number of ways, but two stick out for me:

with recursion: scrape_it() sees there are more pages to scrape and calls itself with the next page
with a while loop: you assume you might need multiple fetches and do all the work in a loop that continues to run as long is there a next page

For either method, I mocked up "sample" pages, I hope it's not too abstract, and that you can see there's some data you really care about, reports, and some meta-data that tells you there are more reports to be had, next_url:

name	POS	id	Ref	ALT	Frequency	CDS	Start	End	sequence_cc
chrM	41	.	C	T	0.002498	CDS	3307	4262	-3265
chrM	42	rs377245343	T	TC	0.001562	CDS	3307	4262	-3264
chrM	55	.	TA	T	0.00406	CDS	4470	5511	-4414
chrM	55	.	T	C	0.001874	CDS	4470	5511	-4414

Validate an Address in Airtable

Validate an address with SmartyStreets from a script in Airtable. You can even use your "free 250 lookups per month"!

SmartyStreet Configuration

Fill in SS_KEY and SS_LICENSE with your SmartyStreets info.

Import rows of data into individual PDFs

How to get data like this...

Name	Age	Street Address	City	State	Zip
Tami	23	123 Main St	Anytown	Anystate	11111
John	54	456 Second Ave	Anytown	Anystate	22222
Troy	39	789 Last Cir	Anytown	Anystate	99999

	import re


	# Replace embedded escaped unicode with their actual unicode values:
	#
	# `\Not wanted backslashes\ unicode: \u2019\u2026`
	#
	# to:
	#
	# `\Not wanted backslashes\ unicode: ’…`

	import argparse
	from collections import defaultdict
	import csv


	class Actor(object):
	"""An actor with bounded rationality.

	The methods on this class such as u_success, u_failure, eu_challenge are
	meant to be calculated from the actor's perspective, which in practice

	const fetchMachine = Machine({
	id: 'distinct & valid',
	initial: 'new',
	states: {
	'new': {
	on: {
	FOUND_DISTINCT: 'distinct',
	FOUND_NOT_DISTINCT: 'no'
	}
	},

	// Available variables:
	// - Machine
	// - interpret
	// - assign
	// - send
	// - sendParent
	// - spawn
	// - raise
	// - actions
	// - XState (all XState exports)