Skip to content

Instantly share code, notes, and snippets.

@maxx-coffee
Created May 18, 2013 20:40
Show Gist options
  • Save maxx-coffee/5605717 to your computer and use it in GitHub Desktop.
Save maxx-coffee/5605717 to your computer and use it in GitHub Desktop.
Pinterest style image scraping using Nokogiri & FastImage
def scrape_images(url)
doc = Nokogiri::HTML(open(url))
url = url.slice( /\A(http|https)(:\/\/){1}[a-z0-9\-\.]{1,}/i )
images = []
doc.css("img").each do |img|
img = img['src']
if img =~ /\A\/.*/
src = url + img
# Check for complete link
elsif img =~ /\Ahttp|https/i
src = img
# Handle a relative path
else
src = url + '/' + img
end
puts src
size = FastImage.size(src)
if size[0] > 1 && size[1] > 1
images << src
end
end
images
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment