Skip to content

Instantly share code, notes, and snippets.

@ishideo
Forked from borkdude/scrape_tables.clj
Created April 8, 2021 09:40
Show Gist options
  • Save ishideo/d149269794330008f9cddeadbc628771 to your computer and use it in GitHub Desktop.
Save ishideo/d149269794330008f9cddeadbc628771 to your computer and use it in GitHub Desktop.
Extract HTML tables with babashka and bootleg
(ns scrape
(:require [babashka.pods :as pods]
[clojure.walk :as walk]))
(pods/load-pod "bootleg") ;; installed on path, use "./bootleg" for local binary
(require '[babashka.curl :as curl])
(def clojure-html (:body (curl/get "https://en.wikipedia.org/wiki/Clojure")))
(require '[pod.retrogradeorbit.bootleg.utils :refer [convert-to]])
(def hiccup (convert-to clojure-html :hiccup))
(def tables (atom []))
(walk/postwalk (fn [node]
(when (and (vector? node)
(= :table (first node)))
(swap! tables conj node))
node)
hiccup)
(count @tables) ;; 15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment