Skip to content

Instantly share code, notes, and snippets.

@bcambel
Created January 20, 2015 21:38
Show Gist options
  • Select an option

  • Save bcambel/03f04b1a6811697bdda5 to your computer and use it in GitHub Desktop.

Select an option

Save bcambel/03f04b1a6811697bdda5 to your computer and use it in GitHub Desktop.
Fetch images
(ns fetcher.core
(:use [clojure.java.io :as io])
(:use [net.cgrand.enlive-html]))
(defn fetch-all-images-from-url [src-url dest-folder]
(let [
; get root from src-url: http://google.com/alias1/2/3 ---> http://google.com
root-url (clojure.string/join "/" (take 3 (clojure.string/split src-url #"/")))
; function: if image url starts with "/" ( / character is: \/ in Clojure, ex \a \b etc...) append root url
complete-url (fn [url]
(let [t (first url)]
(if (not= t \/)
url
(str root-url url))))
; get page source
html-src (html-resource (java.net.URL. src-url))
; parse html creating list of mapped url links and image names, create set to avoid duplicates
image-list (set (map #(let [url (complete-url (:src (:attrs %)))
img-name (last (clojure.string/split url #"/"))]
{:url url :img-name img-name})
(select html-src #{[:img]})))
; save to file function
fetch-to-file (fn [url file]
(with-open [in (io/input-stream url)
out (io/output-stream file)]
(io/copy in out)))
]
; actual work here
(dorun (map #(do (println "Fetching" (:url %) "...")
(fetch-to-file (:url %) (str dest-folder "/" (:img-name %))))
image-list))))
; running:
(time (fetch-all-images-from-url "http://www.reddit.com" "/tmp/imgs"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment