Last active
December 28, 2015 14:29
-
-
Save jeroenvandijk/7514963 to your computer and use it in GitHub Desktop.
Alternative Hyperloglog implementation for Cascalog
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ;; Possibly more performant version due the absence of multimethods of http://screen6.github.io/blog/2013/11/13/hyperloglog-with-cascalog.html | |
| ;; Needs to be tested | |
| (defprotocol IHyperLogLogMerge | |
| (hyperloglog-val [this]) | |
| (merge [this other]) | |
| (merge-with-hyperloglog [this other-hll])) | |
| (extend-protocol IHyperLogLogMerge | |
| nil | |
| (hyperloglog-val [this] nil) | |
| (merge-with-hyperloglog [this other-hll] other-hll) | |
| (merge [this other] other) | |
| Object | |
| (hyperloglog-val [this] (doto (new-hyperloglog) (.offer this))) | |
| (merge-with-hyperloglog [this other-hll] (.offer other-hll this)) | |
| (merge [this other] | |
| (merge (hyperloglog-val other) this)) | |
| HyperLogLog | |
| (hyperloglog-val [this] this) | |
| (merge-with-hyperloglog [this other-hll] (.addAll this other-hll)) | |
| (merge [this other] | |
| (merge-with-hyperloglog other this))) | |
| ;; As in the original blog post | |
| (defn merge-n | |
| ([h1] h1) | |
| ([h1 h2] (merge h1 h2)) | |
| ([h1 h2 & more] | |
| (reduce merge (merge h1 h2) more))) | |
| (c/defparallelagg parallel-sum | |
| :init-var #'identity | |
| :combine-var #'merge-n) |
This actually can't be done as is using deparallelagg, can it? Did you run this and have it work? Since the :init-var is identity you'll get a string returned instead of an HLL. When I try to do this I get an exception that there is no matching field "cardinality" for java.lang.String.
I'm not 100% sure, but I think that the :init-var function of a defparallelagg has to return the same type as the :combine-var function in order to work correctly. At least that's what I understood.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Jeroen, to let you know, I've tested it and implemented in the latest revision of the code supplied with the article — https://gist.github.com/ilya-pi/7319327/75dcbbe3d9086b47b121fb2892f4efaedb41d7f8
Though, with the aggregateop approach, merge function is redundant, it will definitely be of a great use later on!