Created
July 11, 2013 07:17
-
-
Save evilpacket/5973230 to your computer and use it in GitHub Desktop.
English letter frequencies in json format
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"a": 8.167, | |
"b": 1.492, | |
"c": 2.782, | |
"d": 4.253, | |
"e": 12.702, | |
"f": 2.228, | |
"g": 2.015, | |
"h": 6.094, | |
"i": 6.966, | |
"j": 0.153, | |
"k": 0.772, | |
"l": 4.025, | |
"m": 2.406, | |
"n": 6.749, | |
"o": 7.507, | |
"p": 1.929, | |
"q": 0.095, | |
"r": 5.987, | |
"s": 6.327, | |
"t": 9.056, | |
"u": 2.758, | |
"v": 0.978, | |
"w": 2.360, | |
"x": 0.150, | |
"y": 1.974, | |
"z": 0.074 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This frequency table doesn't appear to be accurate for 'json keys' in my opinion, because of the high usage of "size" in tech. That alone seems like it would put 'z' at a higher frequency than 'q'.
Here is my attempt at this:
'A', 'B', 'C', 'D', 'E',
'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O',
'P', 'Q', 'R', 'S', 'T',
'U', 'V', 'W', 'X', 'Y',
'Z', '_', '-'
68, 13, 29, 39, 110,
11, 15, 7, 64, 10,
10, 32, 30, 48, 43,
26, 5, 57, 61, 71,
23, 10, 10, 10, 18,
20, 41, 10
This is derived by choosing a random sampling of 100 reasonable keys and taking their frequency, then modifying a bit.
Modifications:
Add a high likelihood of underscore.
Add a lesser arbitrarily chosen amount for dash.
JSON keys only allow uppercase, lowercase, numbers, underscore, and dash.
Here is my frequency normalized:
A 7.63
B 1.46
C 3.25
D 4.38
E 12.35
F 1.23
G 1.68
H 0.79
I 7.18
J 1.12
K 1.12
L 3.59
M 3.37
N 5.39
O 4.83
P 2.92
Q 0.56
R 6.40
S 6.85
T 7.97
U 2.58
V 1.12
W 1.12
X 1.12
Y 2.02
Z 2.24
_ 4.60
The biggest difference I see actually is that Q frequency in the above list is abnormally low.