Skip to content

Instantly share code, notes, and snippets.

@elfenlaid
Last active December 31, 2015 17:19
Show Gist options
  • Save elfenlaid/8019408 to your computer and use it in GitHub Desktop.
Save elfenlaid/8019408 to your computer and use it in GitHub Desktop.
clojure file processing
(defn lazy-open [file]
(letfn [(helper [rdr]
(lazy-seq
(if-let [line (.readLine rdr)]
(cons line (helper rdr))
(do (.close rdr) (println "closed") nil))))]
(do (println "opening")
(helper (clojure.java.io/reader file)))))
(defn parse-cvs [seq]
(loop [sx seq
m {}]
(if-let [line (first sx)]
(let [vs (clojure.string/split line #";")]
(recur (rest sx)
(merge-with into m {(vs 0) #{[(vs 1) (vs 2)]}})))
m)))
;; file obtained from: http://www.informatik.uni-freiburg.de/~cziegler/BX/BX-CSV-Dump.zip
;; smth like 1.1kk lines in total or around 30mb on drive
(take 3 (parse-cvs (lazy-open "resources/BX-CSV-Dump/BX-Book-Ratings.csv")))
;; ..Aaand it consumes about 500+mb of memory in lein repl mode
;; It's possible to somehow reduce memory consumption?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment