Skip to content

Instantly share code, notes, and snippets.

@bcambel
Forked from atroche/fast_newline_count.clj
Created June 3, 2017 07:08
Show Gist options
  • Select an option

  • Save bcambel/3eea8d7f97ce371edeaaf3bb01261c9c to your computer and use it in GitHub Desktop.

Select an option

Save bcambel/3eea8d7f97ce371edeaaf3bb01261c9c to your computer and use it in GitHub Desktop.
(with-open [file-stream (FileInputStream. ten-gb-filename)]
(let [channel (chan 500)
;; make four workers to read byte arrays off the channel:
counters (for [_ (range 4)]
(go-loop [newline-count 0]
(let [barray (async/<! channel)]
(if (nil? barray) ;; channel is closed
newline-count
(recur (+ newline-count
(count-newlines barray)))))))]
(go-loop []
(let [barray (byte-array one-meg) ;; 1024*1024
bytes-read (.read file-stream barray)]
;; this put will block if there are more than 500MBs waiting in channel
;; so as to not engorge the heap (learnt the hard way)
(>! channel barray)
(if (> bytes-read 0) ;; .read returns a -1 on EOF
(recur) ;; (keep going until EOF)
(close! channel))))
(reduce + (map <!! counters))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment