Created
October 24, 2017 17:22
-
-
Save juliends/f8b41ae4976f5400ca5717c345687795 to your computer and use it in GitHub Desktop.
scrapper_imdb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'open-uri' # Open an url | |
require 'nokogiri' # HTML ==> Nokogiri Document | |
url = "http://www.imdb.com/chart/top" | |
html = open(url).read | |
html_doc = Nokogiri::HTML(html) | |
html_doc.search('.titleColumn a').each do |element| | |
title = element.text | |
link = element.attribute('href') | |
actors = element.attribute('title') | |
url = "http://www.imdb.com/#{link}" | |
html1 = open(url).read | |
html_doc1 = Nokogiri::HTML(html1) | |
summary = html_doc1.search('.summary_text').text.strip | |
movie_text = "#{title}\n" | |
movie_text += "#{actors}\n" | |
movie_text += "#{summary}\n" | |
file_path = "#{title.gsub(" ","_")}.txt" | |
File.open(file_path, 'w') do |file| | |
file.write(movie_text) | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment