This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cat civil.csv |trim | parallel 'last=$(echo {} | grep -oE "[^ ]+$" ); if [[ -n $( echo "$last" | grep -iE "jr|sr|ii+") ]]; then suffix="$last"; name=$( echo {} | sed "s:[^ ]\+$::g;s: \+$::g" ); last=$( echo "$name" | grep -oE "[^ ]+$" ); first=$( echo "$name" | sed "s:[^ ]*$::g" ); else first=$( echo {} | sed "s:[^ ]*$::g" ); fi; echo "$last,$first,$suffix"' | trim |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a="'";in=VA_opendata_FY2015.txt; cat $in | sed '1d' | sed '2d' | parallel --block 1G --pipe -N100000 'grep -vE "^$" | ssconvert --export-type Gnumeric_stf:stf_assistant -O '$a'separator=" "'$a' fd://0 fd://1 2>/dev/null' | tr '\r' '\n' | while read LINE; do if [ -z "$LINE" ]; then echo; else echo -n " $LINE"; fi; done | pv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function tsv2redis { in=$(cat); header=$(echo "$in" | head -n 1 ); for_awk=$( echo "$header" | sed 's:\t:\n:g' | sed 's:^\|$:":g' | nl | trim | sed 's:^:\$:g' | tawk '{print $2,$1}' | tr '\t' ',' | tr '\n' ',' | sed 's:,$::g' | sed 's:"\+:":g'); toCountBytes=$( echo "$in" | sed '1d' | tawk "{print $for_awk}" ); echo "$toCountBytes" | LANG=C tawk '{OFS="\n"; print "*"NF,"$5","HMSET"; for(i=2;i<=NF;i++)print "$"length($i),$i}' | sed 's/$/\r/' ;}; cat /tmp/foo | cut -f6 --complement | tsv2redis | head -n 21 |
# get the top 15 entities mentioned by SPLC data on select US hate crimes from media reports
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# method to find most frequently mentioned entities in hate crime reports | |
## prerequisites: | |
1. [MITIE NLP](https://github.com/mit-nlp/MITIE) | |
2. [my ~/.bashrc functions](https://github.com/albert-decatur/dotfiles) | |
3. [Rio command line frontend to R](https://raw.githubusercontent.com/jeroenjanssens/data-science-at-the-command-line/master/tools/Rio) | |
```bash | |
# get the top 15 entities mentioned by SPLC data on select US hate crimes from media reports | |
curl -sL http://bit.ly/SPLC_hate |\ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# get pipe separated info on IATI XML locations | |
xmlstarlet sel -t -m "iati-activities/iati-activity/location/coordinates" -v "concat(../../title,'|',../../description,'|',../../iati-identifier,'|',../../other-identifier,'|',../../participating-org,'|',../../recipient-country,'|',../../sector,'|',../../transaction/value,'|',../../transaction/flow-type,'|',../../transaction/transaction-date,'|',../../transaction/transaction-type,'|',@latitude,'|',@longitude,'|',@precision)" -n $1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for section in "Opinion of the Court" "OBERGEFELL v. HODGES ROBERTS, C. J., dissenting"; do pdftotext 14-556_3204.pdf - | grep -vF "Cite as: 576 U. S. ____ (2015)" | tr '\n' ' ' | tr '\f' '\n' | grep -F "$section" | grep -oE "\([0-9]{4}\)" | sed 's:(\|)::g' | sort | uniq -c | sed "s:^\s*::g;s:\s:\t:g;s:^:$section\t:g"; done | less |