This is an experimental tool to evaluate the quality of the estimations produced by the HyperLogLog algorithm described in the Flajolet et al. paper, with and without the large range correction (§4 of the paper).
This correction is only enabled for cardinalities beyond 143M, so any input below that will return the same result twice.
go mod tidy
go run main.go -count 3000000000
$ go run main.go -card 150000000 # 150M
Without correction: 159441109 (inaccuracy: +6%)
With correction: 162475901 (inaccuracy: +8%)
$ go run main.go -card 500000000 # 500M
Without correction: 532421332 (inaccuracy: +6%)
With correction: 568430738 (inaccuracy: +14%)
$ go run main.go -card 1000000000 # 1G
Without correction: 1014882872 (inaccuracy: +1%)
With correction: 1157814863 (inaccuracy: +16%)
$ go run main.go -card 2000000000 # 2G
Without correction: 1908444304 (inaccuracy: -5%)
With correction: 2523750484 (inaccuracy: +26%)
$ go run main.go -card 2500000000 # 2.5G
Without correction: 2347386494 (inaccuracy: -6%)
With correction: 3396700455 (inaccuracy: +36%)
$ go run main.go -card 3000000000 # 3G
Without correction: 2766452874 (inaccuracy: -8%)
With correction: 4437335342 (inaccuracy: +48%)
$ go run main.go -card 4000000000 # 4G
Without correction: 3455803455 (inaccuracy: -14%)
With correction: 7012793615 (inaccuracy: +75%)
$ go run main.go -card 5000000000 # 5G
Without correction: 3948874477 (inaccuracy: -21%)
With correction: 10816841717 (inaccuracy: +116%)
$ go run main.go -card 6000000000 # 6G
Without correction: 4466485931 (inaccuracy: -26%)
With correction: 0 (inaccuracy: -100%) # exceeded the domain of the log function
The DataDog/hyperloglog
implementation is used, with a minor modification to be able to estimate with and without the large range correction:
} else if estimate > 1.0/30.0*exp32 {
- // Large range correction
- estimate = -exp32 * math.Log(1-estimate/exp32)
+ if withLargeRangeCorrection {
+ estimate = -exp32 * math.Log(1-estimate/exp32)
+ }
}
return uint64(estimate)
}