Skip to content

Instantly share code, notes, and snippets.

@aammd
Created July 20, 2015 16:08
Show Gist options
  • Save aammd/07df767e83cbe7db945b to your computer and use it in GitHub Desktop.
Save aammd/07df767e83cbe7db945b to your computer and use it in GitHub Desktop.
separates text into lines, when lines are delimited by speaker's names in all caps, eg "SPEAKERNAME: hurr durr dee ANOTHERSPEAKER: beep boop"
BEGIN {
FS = "\t"
}
## should be possible to catch logs & other extras by finding unsplit lines
{
n = split($2, lines, /[A-Z]'?[A-Z][A-Za-z'\[\] ]*:/, speaker)
if (n == 1) {
printf "%s\t%s\t%s\t%s\n", $1, 1, "extra", $2
} else {
for (x in speaker) {
printf "%s\t%s\t%s\t%s\n", $1, x, speaker[x], lines[x+1]
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment