Created
May 16, 2014 18:14
-
-
Save jackrg/76ade1724bd816292e4e to your computer and use it in GitHub Desktop.
Add bulk import functionality to Rails Active Record (add this file to config/initializers, call <model>.import!(array-of-record-hashes))
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class ActiveRecord::Base | |
def self.import!(record_list) | |
raise ArgumentError "record_list not an Array of Hashes" unless record_list.is_a?(Array) && record_list.all? {|rec| rec.is_a? Hash } | |
return record_list if record_list.empty? | |
(1..record_list.count).step(1000).each do |start| | |
key_list, value_list = convert_record_list(record_list[start-1..start+999]) | |
sql = "INSERT INTO #{self.table_name} (#{key_list.join(", ")}) VALUES #{value_list.map {|rec| "(#{rec.join(", ")})" }.join(" ,")}" | |
self.connection.insert_sql(sql) | |
end | |
return record_list | |
end | |
def self.convert_record_list(record_list) | |
# Build the list of keys | |
key_list = record_list.map(&:keys).flatten.map(&:to_s).uniq.sort | |
value_list = record_list.map do |rec| | |
list = [] | |
key_list.each {|key| list << ActiveRecord::Base.connection.quote(rec[key] || rec[key.to_sym]) } | |
list | |
end | |
# If table has standard timestamps and they're not in the record list then add them to the record list | |
time = ActiveRecord::Base.connection.quote(Time.now) | |
for field_name in %w(created_at updated_at) | |
if self.column_names.include?(field_name) && !(key_list.include?(field_name)) | |
key_list << field_name | |
value_list.each {|rec| rec << time } | |
end | |
end | |
return [key_list, value_list] | |
end | |
end |
@jackrg how might you write this if you wanted to selectively insert columns from your raw data (e.g. in a csv, say you only want column 1, 3, and 4 to save to your ActiveRecord table instead of all columns 1-4)? I know how to do that using CSV.foreach, but it's taking 5 minutes to import 20,000 rows.
This is vulnerable to sql injection - you need to escape values!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I honestly disagree with baking in the step functionality to that method anyways. Why should that method care about doing anything but preparing the bulk insert statement? It shouldn't be throttling the input, the user shouldn't be passing things to it that slow it down, and if they do, they should be adjusting their input size.
The cleaner implementation is to avoid adding that logic into the method in the first place, for all those suggesting he simply fix the resulting logic bug.