Created
December 12, 2012 17:07
-
-
Save arjunvenkat/4269615 to your computer and use it in GitHub Desktop.
build a scraper using Nokogiri
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# nokogiri requires open-uri to work properly | |
require 'nokogiri' | |
require 'open-uri' | |
# Putting it all together | |
# =============================================== | |
# initialize a url and feed into Nokogiri | |
url = "http://www.rottentomatoes.com/m/lincoln_2011/" | |
doc = Nokogiri::HTML(open(url)) | |
# Creates an array that contains all the rotten/fresh | |
# reviews on the page along with their respective critics | |
# drills down to the lowest level that still contains all necessary data | |
critics = doc.css('div#reviews div.quote_bubble') | |
reviews_array = [] | |
critics.each do |critic| | |
# drills down to pull out a critic's name | |
name = critic.css('div.media_block_content div.bold') | |
name = name.text.strip | |
review = [name] # saves the name into an array called review | |
# drills down to the element that may contain a fresh class | |
fresh = critic.css('div.quote_contents div.fresh') | |
#checks if a fresh class exists | |
if fresh.empty? | |
review << "rotten" | |
else | |
review << "fresh" | |
end | |
reviews_array << review | |
end | |
puts reviews_array.inspect |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment