Last active
May 17, 2017 20:20
-
-
Save boie0025/ae9697eed61cbf5342a6 to your computer and use it in GitHub Desktop.
scraper-psuedocode-exmaple
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
module Producers | |
class DataModelA | |
... # Some ruby magic to return the correct subclass for a state | |
def page | |
@page ||= Nokogiri.new(URL) | |
end | |
end | |
class SpecificState < DataModelA | |
URL = "http://www.example.com/foo" | |
def scraped_data_point_a | |
page.xpath('some xpath where our data is') | |
end | |
def scraped_data_point_b | |
page.xpath('some other xpath') | |
end | |
end | |
end | |
module Consumers | |
class GenericDataModelA | |
attr_accessor :scrape_obj, :persistence_object | |
def initialize(scrape_obj, persistence_object) | |
self.scrape_obj = scrape_obj | |
self.persistence_object | |
end | |
def persist! | |
%i(data_point_a data_point_b).each do |meth| | |
persistence_object.send("#{meth}=", scrape_obj.send("scraped_{meth}")) | |
end | |
end | |
end | |
end | |
class ScraperJob | |
def perform | |
# could iterate through specific consumers, passing various specifics into the generic processor. Since all of the | |
# scrape/producer classes have the same data methods, you're free to define them for arbitrary pages. | |
scraper = Producers::SpecificState.new | |
persistence_obj = OpenStruct.new #add methods, or use AR, or something else. | |
consumer = Consumers::GenericDataModelA.new(scraper, persistence_obj).persist! | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This scraping pattern was built in collaboration with Ryan Long (https://github.com/rtlong) and JD Guzman (https://github.com/jdguzman)