Created
May 28, 2012 13:25
-
-
Save mowat27/2819166 to your computer and use it in GitHub Desktop.
EuroClojure Learnings: TSV file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
; My "lightbulb" moment at EuroClojure 2012 was that clojure is at it's most | |
; powerful when you build abstractions in the data and keep the functions as | |
; generic as possible so you can chain them easily | |
; For example, here is some code I wrote a few months ago to parse a freebase TSV file | |
(def keywords {"id" "freebase_id"}) | |
(defn override-keywords [field-name] | |
(let [alternative (get keywords field-name)] | |
(if (nil? alternative) field-name alternative))) | |
(defn get-field-names [line] (map override-keywords (str/split line #"\t"))) | |
(defn get-fields [line] (str/split line #"\t")) | |
(defn map-fields [hdrs line] (zipmap hdrs line)) | |
(defn map-file [path] | |
(with-open [rdr (reader path)] | |
(let [lines (into [] (line-seq rdr)) | |
hdrs (get-field-names (first lines)) | |
data (map get-fields (rest lines)) ] | |
(map #(map-fields hdrs %1) data)))) | |
; On the other hand, here is some code Malcolm Sparks presented that does much the same | |
; See https://github.com/malcolmsparks/euroclojure2012 | |
(defn get-dataset [] | |
(let [[header & rows] | |
(->> "Downloads/olympic_games.tsv" | |
(file (System/getProperty "user.home")) | |
reader line-seq | |
(map #(split % #"\t")))] | |
(map #(zipmap header %) rows))) | |
; I'm doing a bit more, but there is no doubt that Malcolm's solution is more flexible | |
; and makes better use of the core library. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment