This is a quick example showing how to use regexes to find tri-grams in Shakespeare...well, 570,872 of them, anyway, if we do some basic filtering of non-dialogue.
Though tokenization and n-grams should typically be done using a proper natural language processing framework, it's possible to do in a jiffy from the command-line, using standard Unix tools and ack, the better-than-grep utility.