Last active
December 28, 2015 14:29
-
-
Save jeroenvandijk/7514963 to your computer and use it in GitHub Desktop.
Alternative Hyperloglog implementation for Cascalog
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ;; Possibly more performant version due the absence of multimethods of http://screen6.github.io/blog/2013/11/13/hyperloglog-with-cascalog.html | |
| ;; Needs to be tested | |
| (defprotocol IHyperLogLogMerge | |
| (hyperloglog-val [this]) | |
| (merge [this other]) | |
| (merge-with-hyperloglog [this other-hll])) | |
| (extend-protocol IHyperLogLogMerge | |
| nil | |
| (hyperloglog-val [this] nil) | |
| (merge-with-hyperloglog [this other-hll] other-hll) | |
| (merge [this other] other) | |
| Object | |
| (hyperloglog-val [this] (doto (new-hyperloglog) (.offer this))) | |
| (merge-with-hyperloglog [this other-hll] (.offer other-hll this)) | |
| (merge [this other] | |
| (merge (hyperloglog-val other) this)) | |
| HyperLogLog | |
| (hyperloglog-val [this] this) | |
| (merge-with-hyperloglog [this other-hll] (.addAll this other-hll)) | |
| (merge [this other] | |
| (merge-with-hyperloglog other this))) | |
| ;; As in the original blog post | |
| (defn merge-n | |
| ([h1] h1) | |
| ([h1 h2] (merge h1 h2)) | |
| ([h1 h2 & more] | |
| (reduce merge (merge h1 h2) more))) | |
| (c/defparallelagg parallel-sum | |
| :init-var #'identity | |
| :combine-var #'merge-n) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This actually can't be done as is using deparallelagg, can it? Did you run this and have it work? Since the :init-var is identity you'll get a string returned instead of an HLL. When I try to do this I get an exception that there is no matching field "cardinality" for java.lang.String.
I'm not 100% sure, but I think that the :init-var function of a defparallelagg has to return the same type as the :combine-var function in order to work correctly. At least that's what I understood.