Created
July 28, 2012 21:31
-
-
Save ithayer/3194881 to your computer and use it in GitHub Desktop.
Clojure Thrush Operator Code Examples
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ;; Start of what the above flow in clojure would look like. | |
| ;; Tokenization step. | |
| (->> (clojure.string/split (slurp "/tmp/X") #" ") tokenize-word) | |
| ;; For debugging, just look at the first 100 words... | |
| (->> (clojure.string/split (slurp "/tmp/X") #" ") tokenize-word (take 100)) | |
| ;; or, look at some in the middle... | |
| (->> (clojure.string/split (slurp "/tmp/X") #" ") | |
| tokenize-word | |
| (drop 100) | |
| (take 100)) | |
| ;; or, look at some that have weird stuff in them. | |
| (->> (clojure.string/split (slurp "/tmp/X") #" ") | |
| tokenize-word | |
| (filter (fn [word] (re-matches #"\.|\(|\)" word))) | |
| (take 10)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ;; Example of how the entire flow might be written. | |
| ;; This flow returns words and their counts, with singletons removed. | |
| (->> (clojure.string/split (slurp "/tmp/X") #" ") | |
| tokenize-word | |
| extract-phrases | |
| frequencies ;; Slightly different than sort+uniq -C, but accomplishes the same thing. | |
| (remove (fn [[word count]] (== count 1)))) | |
| ;; Note that you could debug this flow by throwing in a (take ...) or (lg/spy) statement | |
| ;; anywhere in there to inspect intermediate results. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ;; Comparing thrush with threading operator. | |
| > (-> "test" println) | |
| test | |
| > (->> "test" println) | |
| test | |
| > (-> "test" (println "a")) ;; Threading inserts subsequent forms first. | |
| test a | |
| > (->> "test" (println "a")) ;; Thrush puts them last... | |
| a test | |
| > (->> "test" (str "a") (println "here is ")) ;; and pushes them through. | |
| here is a test | |
| ;; Through + push = thrush. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ;; Explanation of commands used above. | |
| sort -- sorts input lexicographically | |
| uniq -- removes duplicates when a file is sorted | |
| uniq -C -- prints how many duplicates there were of each line | |
| egrep -v -- remove lines that match a particular expression | |
| egrep -v "^1 " -- remove lines that only showed up once |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment