Skip to content

Instantly share code, notes, and snippets.

@Sebx
Forked from PWSdelta/zappos_product_scraper.rb
Created July 15, 2018 04:29
Show Gist options
  • Save Sebx/3cc48318bff5f9d8775a6eb7eca4dbe4 to your computer and use it in GitHub Desktop.
Save Sebx/3cc48318bff5f9d8775a6eb7eca4dbe4 to your computer and use it in GitHub Desktop.
Small Ruby script that demonstrates how to use Mechanize to scrape some product details from an array of product URLs from Zappos.com
# http://nokogiri.org/Nokogiri/XML/Node.html#method-i-css
require 'mechanize'
require 'csv'
puts "Product Scraper!!!"
puts ' '
urls = [
"http://www.zappos.com/seavees-teva-universal-sandal-concrete",
"http://www.zappos.com/teva-bomber-sandal-dark-olive",
"http://www.zappos.com/teva-jetter-cigar"]
file = "product_data.csv"
header = "title,sku,image,alt_images"
File.open(file, "w") do |csv|
csv << header
csv << "\n"
(0..urls.length - 1).each do |index|
puts urls[index]
agent = Mechanize.new
page = agent.get(urls[index])
title = page.title
title = title[0..title.index(' - ')].rstrip
sku = page.search("#sku").inner_text
sku = sku[4..sku.length-1]
prod_image = page.search("#detailImage img").first
alt_images = page.search("#productImages ul li a img")
brand_text = page.search("#brandText").inner_text
alt_images = alt_images.map { |x| x[:src] }.join("|")
csv << [title, sku, prod_image[:src], "#{alt_images}"]
csv << "\n"
end
2.times { |x| puts "" }
puts "Done!"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment