Skip to content

Instantly share code, notes, and snippets.

@postspectacular
Last active November 16, 2015 02:01
Show Gist options
  • Save postspectacular/280728505585b87a9e11 to your computer and use it in GitHub Desktop.
Save postspectacular/280728505585b87a9e11 to your computer and use it in GitHub Desktop.
Convert CSV to RDF (stored in EDN format)
(defn load-house-sales
"Loads CSV property sales data from given path, extracts
set of columns and transforms column values.
Returns seq of maps, one per CSV row/record."
[path]
(mapped-csv
path
;; CSV column fields to extract
#{"transaction_id" "price" "date_processed"
"post_code" "property_type" "borough_code"}
;; CSV column transformers
{:transaction_id #(subs % 1 (dec (count %)))
:price #(f/parse-int % 10)
:date_processed #(.parse df %)}))
(defn sale->triples
"Takes a single sales transaction record and returns
seq of triples, using transaction UUID as common subject."
[sale]
(ff/map->facts
{(:transaction_id sale) {"rdf:type" "schema:SellAction"
"schema:price" (:price sale)
"schema:priceCurrency" "GBP"
"schema:postalCode" (:post_code sale)
"schema:purchaseDate" (:date_processed sale)
"ws:onsID" (:borough_code sale)
"ws:propertyType" (:property_type sale)}}))
;; complete transformation pipeline
;; includes sampling step (only keep every 10th record) to save time
(->> "data/london-sales-2013-2014.csv"
(io/resource)
(load-house-sales)
(take-nth 10)
(write-triples-edn "data/sales-2013.edn"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment