Skip to content

Instantly share code, notes, and snippets.

View albert-decatur's full-sized avatar

Albert Decatur albert-decatur

  • AidData
  • Williamsburg, VA, USA
View GitHub Profile
@albert-decatur
albert-decatur / gist:8ce0bc543e1c3874901a
Created July 9, 2015 23:39
get names in last, first, suffix order
cat civil.csv |trim | parallel 'last=$(echo {} | grep -oE "[^ ]+$" ); if [[ -n $( echo "$last" | grep -iE "jr|sr|ii+") ]]; then suffix="$last"; name=$( echo {} | sed "s:[^ ]\+$::g;s: \+$::g" ); last=$( echo "$name" | grep -oE "[^ ]+$" ); first=$( echo "$name" | sed "s:[^ ]*$::g" ); else first=$( echo {} | sed "s:[^ ]*$::g" ); fi; echo "$last,$first,$suffix"' | trim
a="'";in=VA_opendata_FY2015.txt; cat $in | sed '1d' | sed '2d' | parallel --block 1G --pipe -N100000 'grep -vE "^$" | ssconvert --export-type Gnumeric_stf:stf_assistant -O '$a'separator=" "'$a' fd://0 fd://1 2>/dev/null' | tr '\r' '\n' | while read LINE; do if [ -z "$LINE" ]; then echo; else echo -n " $LINE"; fi; done | pv
@albert-decatur
albert-decatur / example function to get RESP bulk string from TSV
Last active August 29, 2015 14:24
redis hash example with RESP bulk string
function tsv2redis { in=$(cat); header=$(echo "$in" | head -n 1 ); for_awk=$( echo "$header" | sed 's:\t:\n:g' | sed 's:^\|$:":g' | nl | trim | sed 's:^:\$:g' | tawk '{print $2,$1}' | tr '\t' ',' | tr '\n' ',' | sed 's:,$::g' | sed 's:"\+:":g'); toCountBytes=$( echo "$in" | sed '1d' | tawk "{print $for_awk}" ); echo "$toCountBytes" | LANG=C tawk '{OFS="\n"; print "*"NF,"$5","HMSET"; for(i=2;i<=NF;i++)print "$"length($i),$i}' | sed 's/$/\r/' ;}; cat /tmp/foo | cut -f6 --complement | tsv2redis | head -n 21
@albert-decatur
albert-decatur / README.md
Created July 8, 2015 02:51
chartpipe of doctor malpractice data

get example D3 of a graph of your data!

example using over one million US Medical Malpractice Cases, 1990-2015

  • input data from the National Practitioner Database here
  • example output here
    • note the huge decline in medical malpractice cases - this is just due to lack of data for 2015!

prerequisites

@albert-decatur
albert-decatur / README.md
Last active August 29, 2015 14:24
SPLC hate crime entity viz method
@albert-decatur
albert-decatur / a
Last active August 29, 2015 14:24
SPLC Hate Crimes, 2015-07-02
# method to find most frequently mentioned entities in hate crime reports
## prerequisites:
1. [MITIE NLP](https://github.com/mit-nlp/MITIE)
2. [my ~/.bashrc functions](https://github.com/albert-decatur/dotfiles)
3. [Rio command line frontend to R](https://raw.githubusercontent.com/jeroenjanssens/data-science-at-the-command-line/master/tools/Rio)
```bash
# get the top 15 entities mentioned by SPLC data on select US hate crimes from media reports
curl -sL http://bit.ly/SPLC_hate |\
@albert-decatur
albert-decatur / README.md
Last active August 29, 2015 14:23
SCOTUSterm2014

##SCOTUS Term 2014 Opinion Slip Citations by Year,

for both Opinion of the Court and the Dissenting Opinion

Lookup table to convert chart's IDs into SCOTUS case names here.
Input PDFs from SCOTUS here.
Processed SCOTUS data here.
R for ggplot2 to make chart here.

@albert-decatur
albert-decatur / iati_loc.sh
Created June 28, 2015 03:00
eg, find . -type f | parallel ./iati_loc.sh
# get pipe separated info on IATI XML locations
xmlstarlet sel -t -m "iati-activities/iati-activity/location/coordinates" -v "concat(../../title,'|',../../description,'|',../../iati-identifier,'|',../../other-identifier,'|',../../participating-org,'|',../../recipient-country,'|',../../sector,'|',../../transaction/value,'|',../../transaction/flow-type,'|',../../transaction/transaction-date,'|',../../transaction/transaction-type,'|',@latitude,'|',@longitude,'|',@precision)" -n $1
@albert-decatur
albert-decatur / citation dates by opinion type
Last active August 29, 2015 14:23
scotus decisions - citations years in court and dissenting opinions
for section in "Opinion of the Court" "OBERGEFELL v. HODGES ROBERTS, C. J., dissenting"; do pdftotext 14-556_3204.pdf - | grep -vF "Cite as: 576 U. S. ____ (2015)" | tr '\n' ' ' | tr '\f' '\n' | grep -F "$section" | grep -oE "\([0-9]{4}\)" | sed 's:(\|)::g' | sort | uniq -c | sed "s:^\s*::g;s:\s:\t:g;s:^:$section\t:g"; done | less
@albert-decatur
albert-decatur / a_README.md
Last active August 29, 2015 14:22
stickshift example using Google ngrams english 1m

motherland squeezed

aka stickshift is beautiful

how to

  1. get SQLite db of Google ngrams (word counts from Google books OCR'd up to 2008)
  2. run stickshift on server.js, here
  3. you can either rename server.js to example_server.js or just redefine "start" under it's package.json
  4. make sure to use your real path to the db!