Skip to content

Instantly share code, notes, and snippets.

@jeroenvandijk
Last active December 28, 2015 14:29
Show Gist options
  • Select an option

  • Save jeroenvandijk/7514963 to your computer and use it in GitHub Desktop.

Select an option

Save jeroenvandijk/7514963 to your computer and use it in GitHub Desktop.
Alternative Hyperloglog implementation for Cascalog
;; Possibly more performant version due the absence of multimethods of http://screen6.github.io/blog/2013/11/13/hyperloglog-with-cascalog.html
;; Needs to be tested
(defprotocol IHyperLogLogMerge
(hyperloglog-val [this])
(merge [this other])
(merge-with-hyperloglog [this other-hll]))
(extend-protocol IHyperLogLogMerge
nil
(hyperloglog-val [this] nil)
(merge-with-hyperloglog [this other-hll] other-hll)
(merge [this other] other)
Object
(hyperloglog-val [this] (doto (new-hyperloglog) (.offer this)))
(merge-with-hyperloglog [this other-hll] (.offer other-hll this))
(merge [this other]
(merge (hyperloglog-val other) this))
HyperLogLog
(hyperloglog-val [this] this)
(merge-with-hyperloglog [this other-hll] (.addAll this other-hll))
(merge [this other]
(merge-with-hyperloglog other this)))
;; As in the original blog post
(defn merge-n
([h1] h1)
([h1 h2] (merge h1 h2))
([h1 h2 & more]
(reduce merge (merge h1 h2) more)))
(c/defparallelagg parallel-sum
:init-var #'identity
:combine-var #'merge-n)
@ilya-pimenov
Copy link

Nice!

@ilya-pimenov
Copy link

Jeroen, to let you know, I've tested it and implemented in the latest revision of the code supplied with the article — https://gist.github.com/ilya-pi/7319327/75dcbbe3d9086b47b121fb2892f4efaedb41d7f8

Though, with the aggregateop approach, merge function is redundant, it will definitely be of a great use later on!

@dkincaid
Copy link

This actually can't be done as is using deparallelagg, can it? Did you run this and have it work? Since the :init-var is identity you'll get a string returned instead of an HLL. When I try to do this I get an exception that there is no matching field "cardinality" for java.lang.String.

I'm not 100% sure, but I think that the :init-var function of a defparallelagg has to return the same type as the :combine-var function in order to work correctly. At least that's what I understood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment