Skip to content

Instantly share code, notes, and snippets.

@ithayer
Created July 28, 2012 21:31
Show Gist options
  • Select an option

  • Save ithayer/3194881 to your computer and use it in GitHub Desktop.

Select an option

Save ithayer/3194881 to your computer and use it in GitHub Desktop.
Clojure Thrush Operator Code Examples
;; Start of what the above flow in clojure would look like.
;; Tokenization step.
(->> (clojure.string/split (slurp "/tmp/X") #" ") tokenize-word)
;; For debugging, just look at the first 100 words...
(->> (clojure.string/split (slurp "/tmp/X") #" ") tokenize-word (take 100))
;; or, look at some in the middle...
(->> (clojure.string/split (slurp "/tmp/X") #" ")
tokenize-word
(drop 100)
(take 100))
;; or, look at some that have weird stuff in them.
(->> (clojure.string/split (slurp "/tmp/X") #" ")
tokenize-word
(filter (fn [word] (re-matches #"\.|\(|\)" word)))
(take 10))
;; Example of how the entire flow might be written.
;; This flow returns words and their counts, with singletons removed.
(->> (clojure.string/split (slurp "/tmp/X") #" ")
tokenize-word
extract-phrases
frequencies ;; Slightly different than sort+uniq -C, but accomplishes the same thing.
(remove (fn [[word count]] (== count 1))))
;; Note that you could debug this flow by throwing in a (take ...) or (lg/spy) statement
;; anywhere in there to inspect intermediate results.
;; Comparing thrush with threading operator.
> (-> "test" println)
test
> (->> "test" println)
test
> (-> "test" (println "a")) ;; Threading inserts subsequent forms first.
test a
> (->> "test" (println "a")) ;; Thrush puts them last...
a test
> (->> "test" (str "a") (println "here is ")) ;; and pushes them through.
here is a test
;; Through + push = thrush.
;; Explanation of commands used above.
sort -- sorts input lexicographically
uniq -- removes duplicates when a file is sorted
uniq -C -- prints how many duplicates there were of each line
egrep -v -- remove lines that match a particular expression
egrep -v "^1 " -- remove lines that only showed up once
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment