Last active
January 31, 2017 09:17
-
-
Save DmitryBe/1296dc16b2fc4f67143f039e9745ade4 to your computer and use it in GitHub Desktop.
Algebird Hyper Log Log
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import com.twitter.algebird.HyperLogLogMonoid | |
//define test data | |
val data = Seq("aaa", "bbb", "ccc") | |
//create algebird HLL | |
val hll = new HyperLogLogMonoid(bits = 10) | |
//convert data elements to a seq of hlls | |
val hlls = data.map { str => | |
val bytes = str.getBytes("utf-8") | |
hll.create(bytes) | |
} | |
//or: | |
val data = List(1, 1, 2, 2, 3, 3, 4, 4, 5, 5) | |
val hlls = data.map { hllMonoid.create(_) } | |
//WARN: don`t use merged.size - it is a different thing | |
//get the estimate count from merged hll | |
println("estimate count: " + hll.sizeOf(merged).estimate) | |
//or | |
println("estimate count: " + merged.approximateSize.estimate) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment