Created
February 27, 2018 20:53
-
-
Save defndaines/8e4124fa2d6c530d477e06f91223188d to your computer and use it in GitHub Desktop.
Example using Jsoup in Clojure to parse a webpage
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{:deps | |
{org.jsoup/jsoup {:mvn/version "1.11.2"}}} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(import '[org.jsoup Jsoup]) | |
(def url "https://en.wikipedia.org/wiki/World_Fantasy_Award%E2%80%94Novel") | |
(def html (.get (Jsoup/connect url))) | |
(def winners (.select html "tr[style=background:#B0C4DE;]")) | |
;; Jsoup Selectors: https://jsoup.org/cookbook/extracting-data/selector-syntax | |
(defn pull-details | |
[elem] | |
(letfn [(text-from [selector] (.text (.select elem selector)))] | |
{:year (text-from "th a") ; misses years with a tie | |
:author (text-from "span.fn a") | |
:title (text-from "td i")})) | |
(def awards | |
(reduce | |
(fn [acc e] | |
(let [book (pull-details e)] | |
(if (empty? (:year book)) | |
(let [year (:year (first acc))] | |
(conj acc (assoc book :year year))) | |
(conj acc book)))) | |
'() | |
winners)) | |
;; ({:year "2017" :author "Claire North" :title "The Sudden Appearance of Hope"}] | |
;; {:year "2016" :author "Anna Smaill" :title "The Chimes"} | |
;; ,,,) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment