Skip to content

Instantly share code, notes, and snippets.

@kavyasukumar
Last active August 2, 2016 17:27
Show Gist options
  • Select an option

  • Save kavyasukumar/ef761feac73cce07694dba3f75cb1d07 to your computer and use it in GitHub Desktop.

Select an option

Save kavyasukumar/ef761feac73cce07694dba3f75cb1d07 to your computer and use it in GitHub Desktop.
require 'kinto_box'
require 'json'
require 'nokogiri'
require 'httparty'
# Load kinto connection information from datastore-config.json
config = JSON.parse(File.read('datastore-config.json'))
# Create a new Kinto connection
kinto = KintoBox.new(config['kinto_server'])
# Get the bucket object for this project
BUCKET = kinto.bucket(config['bucket'])
# Model of the record to be stored
class Movie
@@collection = BUCKET.collection('movies')
def initialize(html, rank)
@rank = rank
@title = html.css('.lister-item-content').css('h3').css('a')[0].text
@rating = html.css('.ratings-imdb-rating').css('strong').text
@poster_img_url = html.css('.lister-item-image').css('a').css('img')[0]['src']
end
def object_hash
{
'rank' => @rank,
'title' => @title,
'rating' => @rating,
'posterImgUrl' => @poster_img_url
}
end
def push_to_kinto
@@collection.create_record(object_hash)
end
def self.delete_all_in_kinto
@@collection.delete_records
end
end
# Scraper ruby script
puts 'scraping...'
url = 'http://www.imdb.com/search/title?groups=top_100&sort=user_rating'
page_html = HTTParty.get(url)
parsed_html = Nokogiri::HTML(page_html)
movies = []
Movie.delete_all_in_kinto
parsed_html.css('.lister-list .lister-item').each_with_index do |movie, i|
new_movie = Movie.new(movie, i + 1)
movies.push(new_movie)
new_movie.push_to_kinto
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment