The HyperLogLog (HLL) algorithm is a probabilistic data structure used for approximating the number of distinct elements in a multiset while using significantly less memory compared to exact methods Source 3. It is an extension of the earlier LogLog algorithm and derives from the 1984 Flajolet–Martin algorithm Source 3.
The main idea behind the HLL algorithm is to estimate the cardinality of a set by observing the maximum number of leading zeros in the binary representation of each number in the set Source 0. To achieve this, the following steps are taken:
-
Apply a hash function to each element in the original multiset to obtain a multiset of uniformly distributed random numbers with the same cardinality as the original multiset Source 3.
-
Split the multiset into multiple subsets (bu