Skip to content

Instantly share code, notes, and snippets.

@zacharysyoung
zacharysyoung / commands.sh
Last active June 16, 2021 04:08
Parsing PRT file
#!/bin/bash
grep 'INSTANT CH4' *.PRT | \ # scan all files (*.PRT) and filter each file by the text "INSTANT CH4"
awk ' NR % 2 == 1 { print; } ' | \ # there are two different datasets per file with your variables, this takes the
# 'INSTANT CH4' line from the first dataset
cut -c 1-7,67-76 | \ # cut out everything *but* the filename/year (first 7 characters) and the column
# for the data point you care about (characters 67 to 76)
sed -E 's/ +/,/' \ # `cut` takes year and data columns and joins them with a space, `sed` replaces
# the space with a comma for CSV
> INSTANT_CH4.csv # save the output to a CSV file
@zacharysyoung
zacharysyoung / README.md
Last active January 22, 2021 09:39
NC and PRT files (#BelieveInYourself #NeverGiveUp #Grep)

OMG, I was feeling so low about an hour ago:

  • I was looking at a super-slick shell script written by a pro (I'll call this person "Ace") using tools I hadn't seen before
  • The structure of the NC file was even more daunting at first blush than the PRT files—the PRT was massive, but it was flat
  • A basic idea like the values being the same between the PRT and NC files was eluding me

And I was about to email you this message:

So, here's my analysis of what's going on...

This shit is so far over my head!

@zacharysyoung
zacharysyoung / multiple_emails.js
Last active April 23, 2021 20:04
Find and deal with cells in the spreadsheet that have multiple email addresses
/*
You can run this in Chrome by:
1. going to View > Developer > Developer Tools
2. find the "Console" tab
3. copy all the stuff below in one chunk, and paste into the console
4. hit <Enter>
After that you can modify a line by copying it and pasting onto new a line and hitting <Enter> to re-run that line
*/
@zacharysyoung
zacharysyoung / README.md
Last active July 25, 2024 19:58
Import rows of data into individual PDFs

Import rows of data into individual PDFs

How to get data like this...

Name Age Street Address City State Zip
Tami 23 123 Main St Anytown Anystate 11111
John 54 456 Second Ave Anytown Anystate 22222
Troy 39 789 Last Cir Anytown Anystate 99999
@zacharysyoung
zacharysyoung / machine.js
Last active June 2, 2021 19:15
Funding/Check states
// Available variables:
// - Machine
// - interpret
// - assign
// - send
// - sendParent
// - spawn
// - raise
// - actions
// - XState (all XState exports)
const fetchMachine = Machine({
id: 'distinct & valid',
initial: 'new',
states: {
'new': {
on: {
FOUND_DISTINCT: 'distinct',
FOUND_NOT_DISTINCT: 'no'
}
},
@zacharysyoung
zacharysyoung / README.md
Last active May 11, 2023 19:29
Validate an Address in Airtable, with SmartyStreets

Validate an Address in Airtable

Validate an address with SmartyStreets from a script in Airtable. You can even use your "free 250 lookups per month"!

validate

SmartyStreet Configuration

Fill in SS_KEY and SS_LICENSE with your SmartyStreets info.

import argparse
from collections import defaultdict
import csv
class Actor(object):
"""An actor with bounded rationality.
The methods on this class such as u_success, u_failure, eu_challenge are
meant to be calculated from the actor's perspective, which in practice
name POS id Ref ALT Frequency CDS Start End sequence_cc
chrM 41 . C T 0.002498 CDS 3307 4262 -3265
chrM 42 rs377245343 T TC 0.001562 CDS 3307 4262 -3264
chrM 55 . TA T 0.00406 CDS 4470 5511 -4414
chrM 55 . T C 0.001874 CDS 4470 5511 -4414

Two ways to get all (sub)pages

You load a "base" URL, and if there are more results than 100, you load subsequent pages till you have ALL results.

You can accomplish this a number of ways, but two stick out for me:

  • with recursion: scrape_it() sees there are more pages to scrape and calls itself with the next page
  • with a while loop: you assume you might need multiple fetches and do all the work in a loop that continues to run as long is there a next page

For either method, I mocked up "sample" pages, I hope it's not too abstract, and that you can see there's some data you really care about, reports, and some meta-data that tells you there are more reports to be had, next_url: