Skip to content

Instantly share code, notes, and snippets.

@evilpacket
Created July 11, 2013 07:17
Show Gist options
  • Save evilpacket/5973230 to your computer and use it in GitHub Desktop.
Save evilpacket/5973230 to your computer and use it in GitHub Desktop.
English letter frequencies in json format
{
"a": 8.167,
"b": 1.492,
"c": 2.782,
"d": 4.253,
"e": 12.702,
"f": 2.228,
"g": 2.015,
"h": 6.094,
"i": 6.966,
"j": 0.153,
"k": 0.772,
"l": 4.025,
"m": 2.406,
"n": 6.749,
"o": 7.507,
"p": 1.929,
"q": 0.095,
"r": 5.987,
"s": 6.327,
"t": 9.056,
"u": 2.758,
"v": 0.978,
"w": 2.360,
"x": 0.150,
"y": 1.974,
"z": 0.074
}
@handcoding
Copy link

Can I ask the source from where you got this data?

(From what I can tell, these numbers all look highly plausible, but I’d just feel better if I knew where the data was derived from?)

@evilpacket
Copy link
Author

@handcoding no clue after 11 years. I suspect I would have stolen it from a wikipedia source but I can't be certain at this point.

@nanoscopic
Copy link

nanoscopic commented Feb 12, 2025

This frequency table doesn't appear to be accurate for 'json keys' in my opinion, because of the high usage of "size" in tech. That alone seems like it would put 'z' at a higher frequency than 'q'.

Here is my attempt at this:
'A', 'B', 'C', 'D', 'E',
'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O',
'P', 'Q', 'R', 'S', 'T',
'U', 'V', 'W', 'X', 'Y',
'Z', '_', '-'

68, 13, 29, 39, 110,
11, 15, 7, 64, 10,
10, 32, 30, 48, 43,
26, 5, 57, 61, 71,
23, 10, 10, 10, 18,
20, 41, 10

This is derived by choosing a random sampling of 100 reasonable keys and taking their frequency, then modifying a bit.

Modifications:
Add a high likelihood of underscore.
Add a lesser arbitrarily chosen amount for dash.

JSON keys only allow uppercase, lowercase, numbers, underscore, and dash.

Here is my frequency normalized:
A 7.63
B 1.46
C 3.25
D 4.38
E 12.35
F 1.23
G 1.68
H 0.79
I 7.18
J 1.12
K 1.12
L 3.59
M 3.37
N 5.39
O 4.83
P 2.92
Q 0.56
R 6.40
S 6.85
T 7.97
U 2.58
V 1.12
W 1.12
X 1.12
Y 2.02
Z 2.24
_ 4.60

  • 1.12

The biggest difference I see actually is that Q frequency in the above list is abnormally low.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment