Skip to content

Instantly share code, notes, and snippets.

@seven1m
Created March 26, 2010 04:36
Show Gist options
  • Save seven1m/344525 to your computer and use it in GitHub Desktop.
Save seven1m/344525 to your computer and use it in GitHub Desktop.
Mr. Scott needed some Wufoo importing love...
#!/usr/bin/env ruby
# dedup_emails_for_scott.rb
# usage:
# ruby dedup_emails_for_scott.rb file_to_upload.csv existing_entries.csv file_to_upload_deduped.csv
EMAIL_COL = 'Email' # change this if your column name is not 'Email'
if ARGV.length == 3
upload_filename, existing_filename, output_filename = ARGV
else
puts "Usage: ruby dedup_emails_for_scott.rb file_to_upload.csv existing_entries.csv file_to_upload_deduped.csv"
puts "The first 2 files must already exist. The last filename will be written."
exit(1)
end
require 'rubygems'
require 'fastercsv'
existing = FasterCSV.parse(File.read(existing_filename), :headers => true)
headers = existing.first.to_hash.keys.sort
FasterCSV.open(output_filename, 'w') do |output|
output << headers
FasterCSV.open(upload_filename, 'r', :headers => true) do |csv|
csv.each do |record|
if existing.detect { |r| r[EMAIL_COL].downcase == record[EMAIL_COL].downcase }
record[EMAIL_COL] = nil
end
existing << record
output << headers.map { |h| record[h] }
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment