Skip to content

Instantly share code, notes, and snippets.

@pingles
Created November 3, 2010 16:13
Show Gist options
  • Save pingles/661278 to your computer and use it in GitHub Desktop.
Save pingles/661278 to your computer and use it in GitHub Desktop.
(ns tree
(:require [clojure.contrib.string :as str]
[clojure.contrib.io :as io]))
(defrecord Node [postcode region-id])
(def lines (rest (io/read-lines "/Users/paul/Work/pes_db.csv")))
(def postcodes-from-file (map (fn [x] (let [parts (str/split #"," x)
postcode (str/replace-str " "
""
(first parts))]
(Node. postcode (second parts))))
lines))
(defn next-record
[record]
(Node. (apply str
(rest (:postcode record)))
(:region-id record)))
(defn record-to-tree
([record] (record-to-tree record {}))
([record tree]
(let [postcode (:postcode record)]
(if (nil? (seq postcode))
{:region-id (:region-id record)}
(assoc tree
(keyword (str (first postcode)))
(record-to-tree (next-record record)
{}))))))
(defn merge-tree
[tree other]
(if (not (map? other))
tree
(merge-with (fn [x y] (merge-tree x y))
tree other)))
(def results (reduce merge-tree
{}
(map record-to-tree
postcodes-from-file)))
(defn lookup-postcode
[results postcode]
(if (nil? (seq postcode))
(:region-id results)
(recur (results (keyword (str (first postcode))))
(rest postcode))))
@jafingerhut
Copy link

Try the code below that creates a map instead of a tree. It also avoids def'ing a top-level symbol like lines or postcodes-from-file, so those intermediate values need not be kept in memory at the end, and depending upon the version of Clojure you are using might even become garbage as soon as you advance past each element in the sequences.

(ns tree
  (:require [clojure.contrib.string :as str]
            [clojure.contrib.io :as io]))

(set! *warn-on-reflection* true)

(defn add-entries-to-transient-map [h pair-coll]
  (if-let [[k v] (seq pair-coll)]
    (recur (assoc! h k v) (rest pair-coll))
    h))

(defn map-from-pairs [pair-coll]
  (persistent! (add-entries-to-transient-map (transient {}) pair-coll)))

(def results
  (let [lines (rest (io/read-lines "/Users/paul/Work/pes_db.csv"))]
    (map-from-pairs (map (fn [x]
                           (let [parts (str/split #"," x)
                                 postcode (str/replace-str " "
                                                           ""
                                                           (first parts))]
                             [postcode (second parts)]))
                         lines))))

;; Now instead of (lookup-postcode results postcode), you can use
;; (results postcode) to look up postcode in the map results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment