Skip to content

Instantly share code, notes, and snippets.

@zacharysyoung
zacharysyoung / Better_CSV_doc.md
Last active February 4, 2022 16:39
CSVs: reading, processing, writing w/Python

Welcome to CSV w/Python!

CSV files are a good way to share tables of data, and Python's CSV module makes working with them straightforward.

This guide will quickly move you through all the concepts you need to fill in this basic CSV program:

# Open the CSV
# Read all the data
# Extract the header
import re
# Replace embedded escaped unicode with their actual unicode values:
#
# `\Not wanted backslashes\ unicode: \u2019\u2026`
#
# to:
#
# `\Not wanted backslashes\ unicode: ’…`

Processing 30GB of text

Working out how to read-in 30GB worth of text without a line-breaking record separator... at least, there wasn't a newline in the short example:

28807644'~'0'~'Maun FCU'~'US#@#@#28855353'~'0'~'WNB Holdings LLC'~'US#@#@#29212330'~'0'~'Idaho First Bank'~'US#@#@#29278777'~'0'~'Republic Bank of Arizona'~'US#@#@#29633181'~'0'~'Friendly Hills Bank'~'US#@#@#29760145'~'0'~'The Freedom Bank of Virginia'~'US#@#@#100504846'~'0'~'Community First Fund Federal Credit Union'~'US#@#@#

To make this right:

  • replace field separator '~' with ,
  • replace record separator #@#@# with \n

Two ways to get all (sub)pages

You load a "base" URL, and if there are more results than 100, you load subsequent pages till you have ALL results.

You can accomplish this a number of ways, but two stick out for me:

  • with recursion: scrape_it() sees there are more pages to scrape and calls itself with the next page
  • with a while loop: you assume you might need multiple fetches and do all the work in a loop that continues to run as long is there a next page

For either method, I mocked up "sample" pages, I hope it's not too abstract, and that you can see there's some data you really care about, reports, and some meta-data that tells you there are more reports to be had, next_url:

name POS id Ref ALT Frequency CDS Start End sequence_cc
chrM 41 . C T 0.002498 CDS 3307 4262 -3265
chrM 42 rs377245343 T TC 0.001562 CDS 3307 4262 -3264
chrM 55 . TA T 0.00406 CDS 4470 5511 -4414
chrM 55 . T C 0.001874 CDS 4470 5511 -4414
import argparse
from collections import defaultdict
import csv
class Actor(object):
"""An actor with bounded rationality.
The methods on this class such as u_success, u_failure, eu_challenge are
meant to be calculated from the actor's perspective, which in practice
@zacharysyoung
zacharysyoung / README.md
Last active May 11, 2023 19:29
Validate an Address in Airtable, with SmartyStreets

Validate an Address in Airtable

Validate an address with SmartyStreets from a script in Airtable. You can even use your "free 250 lookups per month"!

validate

SmartyStreet Configuration

Fill in SS_KEY and SS_LICENSE with your SmartyStreets info.

const fetchMachine = Machine({
id: 'distinct & valid',
initial: 'new',
states: {
'new': {
on: {
FOUND_DISTINCT: 'distinct',
FOUND_NOT_DISTINCT: 'no'
}
},
@zacharysyoung
zacharysyoung / machine.js
Last active June 2, 2021 19:15
Funding/Check states
// Available variables:
// - Machine
// - interpret
// - assign
// - send
// - sendParent
// - spawn
// - raise
// - actions
// - XState (all XState exports)
@zacharysyoung
zacharysyoung / README.md
Last active July 25, 2024 19:58
Import rows of data into individual PDFs

Import rows of data into individual PDFs

How to get data like this...

Name Age Street Address City State Zip
Tami 23 123 Main St Anytown Anystate 11111
John 54 456 Second Ave Anytown Anystate 22222
Troy 39 789 Last Cir Anytown Anystate 99999