Skip to content

Instantly share code, notes, and snippets.

@lewisou
Created April 28, 2012 01:59
Show Gist options
  • Save lewisou/2515075 to your computer and use it in GitHub Desktop.
Save lewisou/2515075 to your computer and use it in GitHub Desktop.
Scrape Images from a web page.
# Add the line to the Gemfile of your application.
gem 'rest-client', '~> 1.6.7'
# In your code, we take website www.designbombs.com as an example to scrape all images.
rs = []
RestClient.get('http://www.designbombs.com/') do |response, _req, _res|
if response.code == 200
response.to_s.scan(/<img[^>]+src\s*=\s*['"]([^'"]+)['"][^>]*>/) do |url|
rs << url
end
end
end
# Then we save all url addresses to the database.
# Or you may want to fetch the file content and save them locally
rs.each do |url|
RestClient.get(url) do |response, _req, _result|
# save with paperclip or s3
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment