Skip to content

Instantly share code, notes, and snippets.

@davebrace
Last active December 10, 2015 05:58
Show Gist options
  • Save davebrace/4390916 to your computer and use it in GitHub Desktop.
Save davebrace/4390916 to your computer and use it in GitHub Desktop.
Scrape a product web page to extract title / images / prices.
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open('http://www.amazon.com/Datacolor-Spyder4Pro-S4P100-Colorimeter-Calibration/dp/B006TF37H8'))
images = doc.xpath('//img/@src').map {|n| n.value }
title = doc.xpath('//title').text
prices = doc.to_s.scan(/\$(\d+(?:\.\d{1,2})?)/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment