If you need to manipulate existing data when your code is deployed, there are two main ways to do it:
- Create a rake task to migrate the data after the code is deployed. This is ideal for more complex data migrations.
- Use ActiveRecord models in a migration. This is acceptable for smaller data manipulations.
Regardless of the method you use, make sure to test your migrations before submitting them.
The problem with putting data migrations in models is that they can error out if model logic changes, which is a big pain when deploying to production. However, sometimes a rake task can be overkill for a simple manipulation. Here are some ways to minimize the risk of updating data in migrations.
SQL doesn't care about validations and all the other logic that comes with ActiveRecord models, so executing a raw query can be less error prone. However, executing raw SQL can also be dangerous.
Stubbing out a model in your migrations has two main advantages:
- Guards against the case where a model is removed from the codebase but is still being called in a migration.
- Prevents validations from being run and eliminates overhead from associations.
class AddStatusToModem < ActiveRecord::Migration
class Modem < ActiveRecord::Base
end
def up
add_column :modems, :status, :string
Modem.reset_column_information
Modem.find_each do |modem|
modem.status = 'active'
modem.save!
end
end
def down
remove_column :modems, :status
end
end
The call to reset_column_information
ensures that the Modem
model is updated and has access to the new status
column.
If you are going to use models in your migrations, this is how it should be done.
Handling complex data migrations in a rake task is a good idea
To create a custom rake task:
rails g task data_migration set_user_status
Then populate it with your data migration:
namespace :data_migration do
desc "Sets the default modem status"
task set_modem_status: :environment do
ActiveRecord::Base.record_timestamps = false
Modem.find_each do |modem|
begin
modem.status = 'active'
modem.save!
rescue
puts "Error updating #{modem.id}"
end
end
ActiveRecord::Base.record_timestamps = true
end
end
There are a few notable things about this task:
- Setting
ActiveRecord::Base.record_timestamps = false
prevents ActiveRecord from updating the timestamps on all the records we are touching. - Wrapping the updates in a
begin rescue end
block gives us the opportunity to catch errors and report them so we can handle problematic records later.
Stubbing out models can also help minimize the chance of failure in rake tasks.
First, pull down a dump of the production database with rake repl
, then run your migations and verify everything looks right.