Skip to content

Instantly share code, notes, and snippets.

@bendavis78
Created May 3, 2016 16:54
Show Gist options
  • Save bendavis78/e2963fe6ac621f4eb519302276dc55c5 to your computer and use it in GitHub Desktop.
Save bendavis78/e2963fe6ac621f4eb519302276dc55c5 to your computer and use it in GitHub Desktop.
namespace :scraper do
task scrape: :environment do
# the url we want to scrape
url = "http://www.amazon.com/s/ref=lp_2619525011_nr_n_5?fst=as%3Aoff&rh=n%3A2619525011%2Cn%3A%212619526011%2Cn%3A2686328011&bbn=2619526011&ie=UTF8&qid=1462292509&rnid=2619526011"
# get the raw HTML content
response = HTTParty.get url
html = response.body
# get the root document so we can parse it using CSS selectors
doc = Nokogiri::HTML(html)
# Get each product on the page (#atfResults ul li)
doc.css("#atfResults ul li").each do |product|
# Print the text content of the h2 element for each li item
puts product.css("h2").text
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment