Skip to content

Instantly share code, notes, and snippets.

@seratch
Created January 23, 2012 16:05
Show Gist options
  • Save seratch/1663991 to your computer and use it in GitHub Desktop.
Save seratch/1663991 to your computer and use it in GitHub Desktop.
Scraping picasaweb example
// libraryDependencies += "org.jsoup" % "jsoup" % "1.6.1"
import org.jsoup._
import java.io._
import java.net.URL
import collection.JavaConversions._
val url = "https://picasaweb.google.com/xxx"
val dir = "downloaded"
Jsoup.connect(url).get().select("div img") foreach { image =>
val src = image.attr("src").replaceFirst("s128", "s2000") // limitation?
println("Start downloading " + src + "...")
val is = new URL(src).openStream
val bytes = Stream.continually(is.read).takeWhile(-1 !=).map(_.toByte).toArray
val out = new FileOutputStream(new File(dir + "/"+ src.split("/").last))
out.write(bytes, 0, bytes.size)
try { is.close } catch { case e => }
try { out.close } catch { case e => }
}
println("Done.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment