Created
December 6, 2019 16:17
-
-
Save tjlytle/73e0f928ae3ee5f90d09157103f1ee71 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
$limit = 1000000; | |
$counts = [ | |
'1' => 0, | |
'2' => 0, | |
'3' => 0, | |
'4' => 0, | |
'5' => 0, | |
'6' => 0, | |
'7' => 0, | |
'8' => 0, | |
'9' => 0 | |
]; | |
for ($current = 0; $current < $limit; $current++) { | |
//$number = (string) rand(1, getrandmax()); | |
$number = (string) random_int(1, PHP_INT_MAX); | |
$counts[$number[0]]++; | |
} | |
foreach ($counts as $number => $count) { | |
echo $number . ': ' . ($count/$limit)*100 . '%' . PHP_EOL; | |
} |
Yup, that's the issue: https://twitter.com/tjlytle/status/1202990988240850947
Max in size cuts out most of the set of numbers that start with 9 (near the max)
If you set the min and max to 11--99, you get a normal distribution:
1: 11.1096%
2: 11.1252%
3: 11.058%
4: 11.1275%
5: 11.0732%
6: 11.1191%
7: 11.1153%
8: 11.1355%
9: 11.1366%
Based on what I read, it looks like this implementation is wrong (those are the percentage of occurences), but should look like:
<?php
$limit = 1000000;
$counts = [
'1' => 0,
'2' => 0,
'3' => 0,
'4' => 0,
'5' => 0,
'6' => 0,
'7' => 0,
'8' => 0,
'9' => 0
];
for ($current = 0; $current < $limit; $current++) {
//$number = (string) rand(1, getrandmax());
$number = (string) random_int(1, PHP_INT_MAX);
$counts[$number[0]]++;
}
foreach ($counts as $number => $count) {
echo $number . ': ' . benfordLaw($number) .PHP_EOL;
}
function benfordLaw($leading){
return (log(1+(1/$leading))/log(10));
}
Output
1: 0.30102999566398
2: 0.17609125905568
3: 0.1249387366083
4: 0.096910013008056
5: 0.079181246047625
6: 0.066946789630613
7: 0.057991946977687
8: 0.051152522447381
9: 0.045757490560675
See implementation and expected outputs in other languages: https://rosettacode.org/wiki/Benford%27s_law#JavaScript
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
And you can start to see what's going on here - you can't bin them by the starting digit and expect to see a uniform distribution.
So, for example, if you're generating numbers between 1 and 10, you're counting 2-9 in there own buckets (which get about 10% probability you get from chance). But the outcomes for 1s and 10s are combined in the "1" bucket going on.
Similarly between 1 and 50. There are a lot of numbers in that range that start with the digits 1, 2, 3, and 4 (1 think 11 of them for each). But for 5, 6, 7, 8, 9 there are fewer. 5 is just 5 and 50, 6 is only 6, etc...