Skip to content

Instantly share code, notes, and snippets.

@betamatt
Created December 21, 2014 19:18
Show Gist options
  • Save betamatt/f73ee0289e0cb15bce0a to your computer and use it in GitHub Desktop.
Save betamatt/f73ee0289e0cb15bce0a to your computer and use it in GitHub Desktop.
Line sequencing
(defn- split-lines
"Takes a sequence of content chunks and returns a lazy sequence of individual lines.
(\"abc\" \"de\\nabc\") becomes (\"abcde\" \"abc\")"
[stream]
(let [
chunk (first stream)
remainder (rest stream)
[line leftover] (string/split chunk #"\n+" 2)]
(if leftover
(lazy-seq (cons line (split-lines (cons leftover remainder))))
(recur (cons
(string/join "" (cons line (first remainder)))
(rest remainder))))))
@willscripted
Copy link

  • first, convert the initial sequence to a sequence of characters
  • second, make a reducer that (char-seq -> line-seq)
  • then run first through second

the 2nd step might exist in the core..

@icambron
Copy link

Arguably way worse than yours, but here's my take:

(def chunks ["abc" "de\nabc"])

(defn dechunk [chunks]
  (if-let [s (seq (first chunks))]
    (concat s (lazy-seq (dechunk (rest chunks))))))

(defn split-lines [chunks]
  (->>
    (dechunk chunks)
    (partition-by #(= % \newline))
    (filter #(not= % [\newline]))
    (map #(apply str %))))

(split-lines chunks) ;=> ("abcde" "abc")

@betamatt
Copy link
Author

@icambron This looks like the fully baked version of something I tried after I discovered partition-by. I couldn't complete the thought in the end. The string splitty version was easier for me to reason about.

@icambron
Copy link

@betamatt, yeah, that partition-by includes the partitions themselves make the solution pretty ugly. I think if I'd defined my own partition-by that didn't do that, it would have come out OK.

Edit: also, @ naming people doesn't seem to work in gists? This product needs some serious love.

@willscripted
Copy link

Played around and came up with something similar to icambron, but without all the fancy threading. A nice trick 👍 Noted...

(dechunk chunks) could be replaced with (mapcat #(seq %) chunks) i think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment