Skip to content

Instantly share code, notes, and snippets.

@Joshfindit
Created April 16, 2014 02:49
Show Gist options
  • Save Joshfindit/10800736 to your computer and use it in GitHub Desktop.
Save Joshfindit/10800736 to your computer and use it in GitHub Desktop.
Text analysis for word frequency (good for finding common words in marketing material)
#First, create a folder and copy the raw text from each page in to it's own text file.
#Example: harv_eker1.txt , harv_eker2.txt , harv_eker3.txt , and so on.
#Open a terminal window in that folder, and run the following:
cat *.txt | tr -d '[:punct:]' | tr ' ' '\n' | tr 'A-Z' 'a-z' | sort | uniq -c | sort -rn
#most-used words will be at the top
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment