Created
July 13, 2012 23:10
-
-
Save sparkertime/3108152 to your computer and use it in GitHub Desktop.
Thresher - Separates the Wheat from the Chaff
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require "rake" | |
require "csv" | |
require_relative '../../lib/thresher_sketch' | |
namespace :chicago_new do | |
namespace :contracts do | |
desc "Load city contracts" | |
task :load => :environment do | |
file_name = path_to(ENV["contracts_file"] || "Contracts.csv") | |
ContractsThresher.thresh file_name, :rejections_to => "rejected_contracts.csv" | |
end | |
end | |
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require_relative 'thresher' | |
require_relative '../app/models/vendor' | |
require_relative '../app/models/contract' | |
class ContractsThresher # named after the source rather than a model, but they're conflated in this case | |
extend Thresher | |
create_or_find Vendor, :by => :external_id do | |
fields :external_id => "Vendor ID", | |
:name => "Vendor Name", | |
:address1 => "Address 1", | |
:address2 => "Address 2", | |
:city => "City", | |
:state => "State", | |
:zipcode => "Zip" | |
reject_if_blank :all | |
end | |
upsert Contract, :by => [:purchase_order, :revision] do | |
fields :purchase_order => "Purchase Order (Contract) Number", | |
:vendor => {:association => Vendor, :by => {:external_id => "Vendor ID"}}, # a little repetitive, but anything else was too much magic for me | |
:revision => "Revision Number", | |
:description => "Purchase Order Description", | |
:specification => "Specification Number", | |
:award_amount => {:column => "Award Amount", :formatting => format_award_amount}, # possible by some context/method_missing black magic. Would appreciate your thoughts on this - feels a little too evil, but the alternatives (wrapping with procs, accepting a symbol and calling #send behind the scenes) feel less obvious to use. Also this method enforces a standalone method to do the formatting, which I like. | |
:start_at => {:column => "Start Date", :formatting => format_date}, | |
:end_at => {:column => "End Date", :formatting => format_date}, | |
:approval_at => {:column => "End Date", :formatting => format_date}, | |
:contract_type => "Contract Type", | |
:department => "Department", | |
:procurement_type => "Procurement Type" | |
reject_if_blank :all | |
end | |
def format_award_amount(raw_award) | |
raw_award.strip(0) | |
end | |
def format_date(raw_date) | |
Date.strptime('%m/%d/%Y') | |
end | |
end |
I like where this is headed. This will be immensely helpful in generically applying new structured data for the scope of an endpoint. Will validation rules be applied in some way when a record exists for an upsert? Maybe a method hook contract such as "thresher_record_exists?" and "thresher_handle_existing_record" for when duplication is found via the underlying thresher lib?
Great start!
Chad:
Good idea on the hooks. I'll make sure and add that. As for the other files, you can see this at https://github.com/citizenparker/chicago-finances/commit/139cded2d83ba2b04ce337d9c00fbd7e11c89ad4. As always, feedback welcome.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey Scott,
I do not see lib/thresher_sketch.rb. Could you provide this as part of a gist as well? It would be helpful to better understand the class with its context. Comment added below in the mean time.