Last active
January 4, 2016 03:19
-
-
Save kleinschmidt/8561146 to your computer and use it in GitHub Desktop.
Bash script which combines multiple (possibly redundant) .results files returned by Amazon Mechanical Turk's command line tools.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Author: Dave Kleinschmidt | |
# | |
#!/bin/bash | |
# Author: Dave Kleinschmidt | |
# | |
# concatenate all *.results files in the directory, printing one line | |
# per unique assignment id (first encountered). NOTE: the header isn't | |
# guaranteed to be correct for all files. This script just takes the | |
# first one encountered. Notably, the user-specified | |
# "Answer.<fieldname>" form element results aren't guaranteed to line | |
# up (unless they line up in each file individually) | |
# newline within double quotes correction from | |
# http://www.unix.com/shell-programming-scripting/195671-replace-newline-character-between-double-quotes-space.html | |
# cute awk oneliner for printing unique lines from | |
# http://stackoverflow.com/questions/10842118/explain-this-duplicate-line-removing-order-retaining-one-line-awk-command | |
# list all files to be combined to stderr | |
ls *.results 1>&2 | |
cat *.results | \ # get all .results files | |
awk '(NR-1)%2{$1=$1}1' RS=\" ORS=\" | \ # remove newlines within fields | |
awk -F\t '!x[$19]++' # only print lines with field number | |
# 19 (assignment ID) not encountered before |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment