Last active
April 2, 2024 16:00
-
-
Save eldritchideen/9495299265a5cd04d450 to your computer and use it in GitHub Desktop.
Web scraping in Clojure with Jsoup
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(ns scraping.core | |
(:gen-class) | |
(:import (org.jsoup Jsoup) | |
(org.jsoup.select Elements) | |
(org.jsoup.nodes Element))) | |
(def URL "http://www.smh.com.au/business/markets/52-week-highs?page=-1") | |
(defn get-page [] | |
(.get (Jsoup/connect URL))) | |
(defn get-elems [page css] | |
(.select page css)) | |
(defn -main | |
"Fetch the list of stocks that have made new highs" | |
[& args] | |
(let [html (get-page) | |
elems (get-elems html "#content > section > table > tbody > tr > th > a")] | |
(println (for [e elems] (.text e))))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defproject scraping "0.1.0-SNAPSHOT" | |
:description "FIXME: write description" | |
:url "http://example.com/FIXME" | |
:license {:name "Eclipse Public License" | |
:url "http://www.eclipse.org/legal/epl-v10.html"} | |
:dependencies [[org.clojure/clojure "1.5.1"] | |
[org.jsoup/jsoup "1.7.3"]] | |
:main ^:skip-aot scraping.core | |
:target-path "target/%s" | |
:profiles {:uberjar {:aot :all}}) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment