Skip to content

Instantly share code, notes, and snippets.

@kleinschmidt
Last active January 4, 2016 03:19
Show Gist options
  • Save kleinschmidt/8561146 to your computer and use it in GitHub Desktop.
Save kleinschmidt/8561146 to your computer and use it in GitHub Desktop.
Bash script which combines multiple (possibly redundant) .results files returned by Amazon Mechanical Turk's command line tools.
#!/bin/bash
# Author: Dave Kleinschmidt
#
#!/bin/bash
# Author: Dave Kleinschmidt
#
# concatenate all *.results files in the directory, printing one line
# per unique assignment id (first encountered). NOTE: the header isn't
# guaranteed to be correct for all files. This script just takes the
# first one encountered. Notably, the user-specified
# "Answer.<fieldname>" form element results aren't guaranteed to line
# up (unless they line up in each file individually)
# newline within double quotes correction from
# http://www.unix.com/shell-programming-scripting/195671-replace-newline-character-between-double-quotes-space.html
# cute awk oneliner for printing unique lines from
# http://stackoverflow.com/questions/10842118/explain-this-duplicate-line-removing-order-retaining-one-line-awk-command
# list all files to be combined to stderr
ls *.results 1>&2
cat *.results | \ # get all .results files
awk '(NR-1)%2{$1=$1}1' RS=\" ORS=\" | \ # remove newlines within fields
awk -F\t '!x[$19]++' # only print lines with field number
# 19 (assignment ID) not encountered before
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment