Skip to content

Instantly share code, notes, and snippets.

@jackrusher
Created August 26, 2013 15:22
Show Gist options
  • Save jackrusher/6342637 to your computer and use it in GitHub Desktop.
Save jackrusher/6342637 to your computer and use it in GitHub Desktop.
A lowercase-only version of clojure.string/split with begin/end position of each word in original string. Beautiful, but less efficient than a loop-based implementation.
(def legal-char (set (map char (concat (range 48 58) (range 97 123)))))
(defn tokenize [s]
(->> (map-indexed list (string/lower-case s))
(partition-by (comp not legal-char second))
(filter (comp legal-char second first))
(mapv #(hash-map :s (apply str (map second %))
:begin (first (first %))
:end (first (last %))))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment