Skip to content

Instantly share code, notes, and snippets.

@erasmas
Last active August 29, 2015 14:09
Show Gist options
  • Save erasmas/dd94da1404f46abdcdca to your computer and use it in GitHub Desktop.
Save erasmas/dd94da1404f46abdcdca to your computer and use it in GitHub Desktop.
Collecting unique values from multiple columns in Cascalog. https://groups.google.com/forum/#!topic/cascalog-user/CjpApzUiwHw
(defparallelagg collect-set
:init-var (mapfn [s] #{s})
:combine-var into
:present-var identity)
(let [set->string (mapfn [s1 s2] (clojure.string/join "," (clojure.set/union s1 s2)))]
(??<- [?id ?fruits]
([["1" "banana" "grape"]
["1" "apple" "apple"]
["1" "apple" "lemon"]
["2" "orange" "kiwi"]
["2" "kiwi" "apple"]] ?id ?fruit1 ?fruit2)
(collect-set ?fruit1 :> ?fruits1)
(collect-set ?fruit2 :> ?fruits2)
(set->string ?fruits1 ?fruits2 :> ?fruits)))
;;=> (["1" "apple,banana,grape,lemon"] ["2" "apple,orange,kiwi"])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment