Last active
October 18, 2016 07:52
-
-
Save imwilsonxu/6026856 to your computer and use it in GitHub Desktop.
Extended Doug McIlroy's program from book "Classic Shell Scripting" to process a text file, and output a list of the n most-frequent words, with counts of their frequency of occurrence, sorted by descending count.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /bin/sh - | |
# Read a text stream on standard input, and output a list of | |
# the n (default: 25) most frequently occurring words and | |
# their frequency counts, in order of descending counts, on | |
# standard output. | |
# | |
# Usage: | |
# wf [n] | |
tr -cs A-Za-z\' '\n' | # Replace nonletters with newlines | |
tr A-Z a-z | # Map uppercase to lowercase | |
sort | # Sort the words in ascending order | |
uniq -c | # Eliminate duplicates, showing their counts | |
sort -k1,1nr -k2 | # Sort by descending count, and then by ascending word | |
sed ${1:-25}q # Print only the first n (default: 25) lines |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment