This mini-project compares frequencies of letters in the English language in general and an extract from Frank Herbert's Dune.
The average letter frequencies can be found in letter-frequencies.json
. This file is a JSON object mapping letters to frequencies, and looks like this:
{
"E": 11.1607,
"M": 3.0129,
"A": 8.4966,
"H": 3.0034,
"R": 7.5809,
"G": 2.4705,
"I": 7.5448,
"B": 2.072,
"O": 7.1635,
...
}
This data was obtained from this source, manually copy-pasted, and parsed with a function that can be found in fetchFrequencies.js
.
The sample from the novel is the following:
I must not fear. Fear is the mind-killer. Fear is the little-death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the fear has gone there will be nothing. Only I will remain.
This can be found in dune.txt
.
Running this through textToCharFrequencies.js
will output the following to the console:
[
[ ' ', 60 ], [ 'e', 32 ], [ 't', 27 ],
[ 'i', 25 ], [ 'a', 18 ], [ 'l', 17 ],
[ 'n', 16 ], [ 'r', 16 ], [ 'h', 15 ],
[ 'o', 12 ], [ 's', 11 ], [ '.', 8 ],
[ 'm', 7 ], [ 'w', 7 ], [ 'f', 6 ],
[ 'g', 5 ], [ 'd', 4 ], [ 'p', 4 ],
[ 'u', 3 ], [ 'b', 3 ], [ 'y', 3 ],
[ '-', 2 ], [ 'k', 1 ], [ 'c', 1 ],
[ 'v', 1 ], [ '\n', 1 ]
]
Ignoring spaces, this roughly matches the frequencies of characters in average English text.
Ordered by frequency, the top characters that show up in the novel and the source are as follows:
Dune : E T I A L N R H O
Average: E M A H R G I B O
The letters E and O are exact matches. A is off by one, others are worse. The extract from the novel is a very small sample, which possibly explains the difference in order of appearance.