Skip to content

Instantly share code, notes, and snippets.

@kyleburton
Created November 9, 2010 22:24
Show Gist options
  • Save kyleburton/669950 to your computer and use it in GitHub Desktop.
Save kyleburton/669950 to your computer and use it in GitHub Desktop.
(defn find-dupes-naieve [inp-seq]
(reduce (fn [res item]
(assoc res item (inc (get res item 0))))
{}
(map #(second (.split %1 "\t")) inp-seq)))
(defn find-dupes-with-bloom-filter [inp-seq expected-size fp-prob]
(let [flt (bloom/make-optimal-filter expected-size fp-prob)]
(reduce (fn [res item]
(if-not (bloom/include? flt item)
(do
(bloom/add! flt item)
res)
(assoc res item (inc (get res item 1)))))
{}
(map #(second (.split %1 "\t")) inp-seq))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment