This is a generic guidline that applies mostly to Ruby on Rails and Postgres, but can be usefull for running any database migrations in CI.
- Table of contents
- Zero downtime migrations
- Resources
- Basic tenets
- Migration status
- Locks
- Rollbackable migrations
- Adding a column
- Adding a column with a default value
- Adding an index
- Adding a column with a default and a not null constraint and an index
- Removing a column
- Adding a json column
- Modifying existing schema
- Data migrations (backfilling data)
- Transactions
- Data modeling
- Data access queries (selects)
- Security
- Common gotchya's
- 🚨 Don't break production 🚨
- 🙅🏾 Don't leave the database in a broken state
- 😭 Don't block fellow developers from deploying their code
- 🚧 Always remember that users will be reading and writing to the DB in between each state
- 🤦 Always remember that migrations are code, they cannot be run without loading the application, so the app deploys before migrations are run
Do: Complete a migration successfully or ensure that it is completely undone upon failure.
Don't: Leave the database in a state different from when the failed migration started.
Note: when adding an index concurrently fails the index is left in a broken state and must be manually rolled back with a remove_index in the down migration
Do: Separate
ACCESS EXCLUSIVE
into their own transactions so they can complete quickly.Don't: Use
ACCESS EXCLUSIVE
locks within long running transactions as they block reads/writes to the whole table.
Note: If unavoidable minimize work an exclusive lock has to do. e.g. Set column default in a separate migration to adding a column
ALTER TABLE
DROP TABLE
TRUNCATE
REINDEX
CLUSTER
VACUUM FULL
Always ensure your migrations can be rolled back
Do: Use
from:
andto:
when adding defaults or useup
anddown
methods ifchange
cannot rollbackclass AddSomeColumnToUsers < ActiveRecord::Migration[5.2] def change change_column_default :users, :some_column, from: 'old_default', to: 'new_default' end endDon't: Create a migration with a
change
method that cannot be rolled backclass AddSomeColumnToUsers < ActiveRecord::Migration[5.2] def change change_column_default :users, :some_column, :text, "default_value" end end
Migration definitions that can be rolled back with the change
method
Do: Add a column in it's own PR and fully deploy it before adding code that references the column
Don't: Add a column and try to use it in the same PR
This is because the app must load before the migration runs, so the code will be referencing a column that does not yet exist
https://github.com/ankane/strong_migrations#adding-a-column-with-a-default-value
https://blog.arkency.com/how-to-add-a-default-value-to-an-existing-column-in-a-rails-migration/
Do: Add the default value after adding the column
class AddSomeColumnToUsers < ActiveRecord::Migration[5.2] def up add_column :users, :some_column, :text # The `from` and `to` attributes are not needed # because the down method removes the column change_column_default :users, :some_column, "default_value" end def down remove_column :users, :some_column end endDon't: Add the default in the
add_column
class AddSomeColumnToUsers < ActiveRecord::Migration[5.2] def up add_column :users, :some_column, :text, default: "default_value" end end
Exception: This is not necessary when you are creating a new table that is not referenced
Exception: Since PostgreSQL v11 adding a column with a static default value is no longer an unsafe operation: postgresql docs
adding a column with a constant default value no longer means that each row of the table needs to be updated when the ALTER TABLE statement is executed. Instead, the default value will be returned the next time the row is accessed, and applied when the table is rewritten, making the ALTER TABLE very fast even on large tables.
However, if the default value is volatile (e.g., clock_timestamp()) each row will need to be updated with the value calculated at the time ALTER TABLE is executed. To avoid a potentially lengthy update operation, particularly if you intend to fill the column with mostly nondefault values anyway, it may be preferable to add the column with no default, insert the correct values using UPDATE, and then add any desired default as described below.
class AddDefaultToFooOnBar < ActiveRecord::Migration[5.2]
def up
add_column :bar, :foo, :uuid
change_column_default :bar, :foo, 'gen_random_uuid()'
end
def down
remove_column :bar, :foo
end
end
For backfill tasks we use ruby to iterate over records and update the column value using SecureRandom.uuid
.
If there is a need to do it in pure SQL it can be done using a function as discussed in this SO post
https://github.com/ankane/strong_migrations#adding-an-index-postgres
https://thoughtbot.com/blog/how-to-create-postgres-indexes-concurrently-in
heroku guide to postgresql indexes
Do: Add the index concurrently in a separate migration with
disable_ddl_transaction!
# Migration #1 class AddNameToUsers < ActiveRecord::Migration[5.2] def change add_column :users, :name, :string end end # Migration #2 class IndexUsersOnName < ActiveRecord::Migration[5.2] disable_ddl_transaction! def change add_index :users, :name, algorithm: :concurrently end endDon't: Add an index in the same transaction as adding the column
# 1 migration class AddNameToUsers < ActiveRecord::Migration[5.2] def change add_column :users, :name, :string, index: true end end
Do: Disable the automatic index added by rails for references by specifying
index: false
# Migration #1 class AddAccountInfoToUsers < ActiveRecord::Migration[5.2] def change add_reference :users, :account_info, index: false end end # Migration #2 class IndexUsersOnAccountInfo < ActiveRecord::Migration[5.2] disable_ddl_transaction! def change add_index :users, :account_info_id, algorithm: :concurrently end endDon't: Allow references to be indexed when they are added
# 1 migration class AddAccountInfoToUsers < ActiveRecord::Migration[5.2] def change add_reference :users, :account_info end end
- Add column, add default, add index, create backfill rake task
- Don't forget to update dev seeds so you're fellow developers can pull in your changes
- Add not-null constraint (after running rake task)
# Migration 1:
class AddFooToBar < ActiveRecord::Migration[5.2]
def up
add_column :bar, :foo, :boolean # if it were a reference we would need: , index: false
change_column_default :bar, :foo, false
end
def down
remove_column :bar, :foo
end
end
# Migration 2:
class IndexBarOnFoo < ActiveRecord::Migration[5.2]
disable_ddl_transaction!
def change
add_index :bar, :foo, algorithm: :concurrently
end
end
# 1 migration in the next pr
class AddNotNullToFoo < ActiveRecord::Migration[5.2]
def change
change_column_null :bar, :foo, false
end
end
Process:
- fully deploy PR1
- run the rake task in both redama (staging) and production
- confirm in both staging and production db that no nulls exist
- deploy PR2
https://github.com/ankane/strong_migrations#removing-a-column
Do: Ignore the column in the ActiveRecord model, THEN create a migration to remove the column using
safety_assured
# For Rails 5+ class User < ApplicationRecord self.ignored_columns = ["some_column"] endAfter the above is fully deployed
class RemoveSomeColumnFromUsers < ActiveRecord::Migration[5.2] def change safety_assured { remove_column :users, :some_column } end endDon't: Remove a column before fully deploying the
ignored_columns
code 🙏
safety_assured
does not actually assure safety. It is telling strong migrations that you know what you're doing and disables the checks provided by the gem (see gotchyas at bottom)
https://github.com/ankane/strong_migrations#adding-a-json-column-postgres
Do: Upgrade to Postgres 9.4+ and use jsonb
Don't: Use the json type as it causes problems for
SELECT DISTINCT
queries
If you are stuck on an old version of postgres you can use a json column like this
Whenever you are modifying the existing schema, as opposed to adding or removing schema you must take the steps to ensure that the application is in a working state for each atomic deploy
- Create the place for the new data (column or table)
- Write to both old and new places in the application code (every single place that this could happen)
- Backfill the missing data from the old place to the new place
- Double check the data to ensure everything matches and nothing unexpected was missed
- Convert the application to read from the new place
- Stop writing to the old place
- Remove the old place
In this particular example the steps are doubled in order to use the old namespace for the new schema
-
temporarily override the setter
- this is nice because it makes for clean diffs and least amount of change in the code to remove
- watch out for anytime you may update the column that bypasses the default setter
- in Rails this includes:
[:write_attribute, :update_column, :update_columns, :update_all, :[]=]
- in Rails this includes:
- be careful if the old schema is written to the db differently than the new schema
- ex: if changing from updating a single record to versioned, append only model
- ex: when going from a
has_one
to abelongs_to
you go from updating many fields to only updating a foreign key - ex: this will not work when changing to different tables/models
-
meticulously comb through every place that data could be written
- obvious downside is that it could be a lot of different places
- you could miss a spot which is why it's always good to check that the data matches before switching the reads to the new way (step 4)
- sometimes it may be the only option to gain flexibility you don't have with overriding setters
Do: Modify data in an async task runner (such as a rake task in rails)
Don't: Modify data in a transactional database migration
If a rake task fails it doesn't matter, but if a migration fails it must be undone manually in order to resume production deploys.
- For complicated stuff we usually add a dry run to the rake task.
- It can also take a couple hours and it’s not a big deal.
-
Never spawn asynchronous processes (sidekiq jobs) in transactions because when the worker begins work the passed id could not exist, or worse belong to a different record that was created after.
-
Do not publish the success of an event inside of a transaction because we don't really know if it was a success yet. You must collect the successes/failures and publish events after the transaction. EventStore.batched can help with this.
-
Be conscious of transactions and only use them when operations need to be atomic. If it's ok to do one thing and fail the other without having broken data then do it. For example, something like notification creation should always be idempotent (can't create the same notification twice) so in this case it would be fine to create the ones that succeed, then try to recreate the failures without duplicating notifications.
-
Nested active record transactions do not behave as you might expect. The
ActiveRecord::Rollback
error raised by the inner transaction is does not trigger a rollback in the outer transaction. It is best to avoid nested transactions for logical clarity but this unexpected behavior can be safely avoided by usingtransaction(joinable: false, requires_new: true)
joinable: false
tells all inner transactions to use a real or simulated nested transactionrequires_new: true
on the inner transaction prevents joining to the outer- mysql and postgres don't actually support nested transactions so save_points are used more info
-
Remember that any single activerecord write is already wrapped in a transaction.
Most cases of a rake task should fit this paradigm. We are updating a bunch of records, but we are only selecting the records that have not been updated, so running the task is idempotent and we don't want to rollback all the changes if one record fails to update. We only want to rollback if dependent changes within the block fail.
LegacyPost.where.not(id: Post.pluck(:legacy_id)).find_each do |legacy_post|
Post.transaction do
new_post_attrs = legacy_post.attributes.merge(id: nil, legacy_id: legacy_post.id)
post = Post.create!(new_post_attrs)
legacy_post.comments.each do |comment|
comment.update!(post: post)
end
end
end
This is less common and should only be done if failing to update 1 record makes all the records invalid or if it is part of a transactional API call etc. since large (long running) transactions can cause other lock requests to build up.
ActiveRecord::Base.transaction do
batch = PaymentBatch.create!
Payments.ready_to_process.find_each do |payment|
BatchPaymentProcess.create!(
payment: payment,
batch: batch
)
end
batch.update!(
external_payment_process_id: ExternalPaymentProcessor::BatchProcess.create(batch).id
)
end
Do: Use ActiveRecord bang (!) methods such as
save!
,update!
, anddelete!
as these will raise an exception and trigger a rollback.Don't: Expect a failed
save
,update
, ordelete
(without the bang '!') to trigger a rollback as these do not raise an exception.
This should only be done if you need to do something special with the failed record(s) or to use the exception for logging etc.
Do: Rescue
ActiveRecord::RecordInvalid
outside of the transactiondef do_something_to_2_records(job) Job.transaction do job.update!(title: 'cool') job.user.update!(name: 'coolest') end rescue ActiveRecord::RecordInvalid => exception # the transaction has been rolled back # do something with exception here endDon't: Rescue
ActiveRecord:Rollback
and re-raise it
Do: 💅🏽 use NOT NULL constraints and set defaults for boolean columns
Don't: 🙅🏾 allow NULL and use 3 value boolean logic
If you are allowing nulls in a database boolean column to make form serialization easier, you are making the wrong trade-off. The database schema does not bend the knee to client level logic.
Why?:
- https://medium.freecodecamp.org/a-quick-and-thorough-guide-to-null-what-it-is-and-how-you-should-use-it-d170cea62840
- https://community.filemaker.com/thread/179479
- https://softwareengineering.stackexchange.com/questions/133600/should-i-store-false-as-null-in-a-boolean-database-field
- https://thoughtbot.com/blog/avoid-the-threestate-boolean-problem
- https://stackoverflow.com/questions/11185321/when-should-null-values-of-boolean-be-used
- https://stackoverflow.com/questions/2116719/sql-using-null-values-vs-default-values
- https://modern-sql.com/concept/three-valued-logic
- https://en.wikipedia.org/wiki/Law_of_excluded_middle
- https://www.red-gate.com/simple-talk/sql/t-sql-programming/how-to-get-nulls-horribly-wrong-in-sql-server/#ninth
- compound index on (
a
,b
) performs similarly as index on (a
) for queries that only do comparisons ona
- compound index on (
a
,b
) is not used for queries that only do comparisons onb
- compound index performs better that individual indexes on (
a
) and (b
) for queries that do comparisons on both columns - postgres is smart enough to use 2 single column indexes but performance has been shown to be much better for the compound index:
The best recommendation I have found actually comes directly from the postgres docs:
if your workload includes a mix of queries that sometimes involve only column x, sometimes only column y, and sometimes both columns, you might choose to create two separate indexes on x and y, relying on index combination to process the queries that use both columns. You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone. The last alternative is to create all three indexes, but this is probably only reasonable if the table is searched much more often than it is updated and all three types of query are common. If one of the types of query is much less common than the others, you'd probably settle for creating just the two indexes that best match the common types.
For the case of a polymorphic relationship, say resource_id
and resource_type
:
A combination of the multicolumn index and a separate index on y would serve reasonably well.
So an index on:
- [
resource_id
,resource_type
] order matters - [
resource_type
]
Do: use a join table relationship or a jsonb column
Don't: use an array type column
Most of the time if you are reaching for an array type column you would be better off normalizing the relationship into another table. If you think you absolutely must have an array type column, just use a jsonb column instead. 🤷🏽
If a has_one
association is being used to define a relationship to a relatively static collection of records, this is often better suited as a belongs_to
association.
Do: Reverse a
has_one
association to abelongs_to
association and ahas_many
on the reflected model# columns: [:make, :model, :color_id] class Car < ApplicationRecord # A car belongs to a color # yes it sounds funny but that's ok belongs_to :color end # columns: [:name, :hexcode, :red_value, :blue_value, :green_value] class Color < ApplicationRecord has_many :cars endDon't: Create new
has_one
associations since it leads to bad data access models and event logging# columns: [:make, :model] class Car < ApplicationRecord has_one :color end # columns: [:name, :car_id, :hexcode, :red_value, :blue_value, :green_value] class Color < ApplicationRecord belongs_to :car end
- Commonly this is essentially a more complicated enum often called an allowed value table (AVT)
- A humorous article on why
- Benefits:
- Colors table is much smaller since a single color is not repeated for every car
- Never need to update the colors table when a car's color changes
- Updating the color_id of the car is more semantic to the car's color changing
If you already have a has_one
, you can convert it to a belongs_to
. You essentially need to
store the 'colors' in a new table and follow the steps to modify schema
Do: Try doing an inner join to the subquery so the planner can more easily turn it into a CTE.
select posts.id, sum(comments.points) from posts join comments on posts.id = comments.post_id join (select user.id from users where user.something = 'something') as real_users on posts.user_id = real_users.idDon't: Filter the result set of an outer query based on the result of an expensive subquery in the where clause.
select posts.id, sum(comments.points) from posts join comments on posts.id = comments.post_id where posts.user_id in ( select user.id from users where user.something = 'something' )
The above example is a bit contrived since the query is so simple, but as the subquery becomes more complex, the cost of performing it for every row of the outer result set becomes greater.
Counting and other aggregate methods are often the most expensive parts of a query
count(DISTINCT n)
is especially slow and can often be avoided by better filtering the initial query- Sometimes it can be beneficial to cache aggregate computations in another table
- this can lead to much more complexity in the application so weigh the costs against your performance needs
- consider how often you will need to update these computations vs how often you will access the values
Do: Use
count(*)
as it is essentially psql syntax forcount()
.Don't: Mess about with
count(1)
orcount(id)
as these can be slightly less efficient.
https://security.stackexchange.com/questions/93902/is-postgress-uuid-generate-v4-securely-random
Do: use
gen_random_uuid()
from the pgcrypto extensionDon't: use
uuid_generate_v4()
from the uuid-ossp extension
If you need to create cryptographically secure UUIDs (or ensure real randomness), do not use the uuid_generate_v4()
function provided by uuid-ossp
. That function falls back to using a pseudo random number generator (pRNG) when the cryptographically secure random number generator (RNG) fails. Instead, use the gen_random_uuid()
function from pgcrypto. This function raises an exception when true RNG fails instead of falling back to a less secure implementation.
It is always good to set an aggressive db timeout for main application access. This means that when you are say, updating a column for an entire table to a single default value, you must do this in batches so that it does not trigger a timeout.
Do: Use
MyModel.in_batches.update_all(new_field: 'default_value')
Don't: Use
MyModel.update_all(new_field: 'default_value')
(withoutin_batches
)
Examples:
-
🤦🏿 Often times someone new to using strong migrations, will write a migration to remove a column in the normal rails way. Then when running the migration, they will see the strong migrations error with a link to the github and a note to wrap the migration in safety_assured { ... }. Then simply wrap their migration with the safety assured block, see that in runs successfully, and attempt to deploy. This is very dangerous because now when the production app runs the migration, it will remove that column, but the existing running application has cached it's active record attribute methods on each model so you will get a lot of column does not exist errors that 500 the application until the column is added back or all application servers are restarted.
-
🥶 Writing 2 prs at the same time and not waiting for the column ignore to fully deploy before merging the column removal
Do: Ensure that the PR to ignore the column in the model is completely deployed before merging the PR to remove the column from the DB.
Don't: Merge the PR to remove the column from the DB before the deploy to ignore the column in the model is completely finished.
Do: Use
disable_ddl_transaction!
at the top of the migration class.Don't: Use
commit_db_transaction
within a change, up, or down method
This is because using commit_db_transaction
can lead to migration steps that do not get rolled back which breaks the first tenet of safe migrations
Do: Only perform a single concurrent task (such as adding an index concurrently) in a migration with
disable_ddl_transaction!
# Migration #1 class AddNameToUsers < ActiveRecord::Migration[5.2] def change add_column :users, :name end end # Migration #2 class IndexUsersOnName < ActiveRecord::Migration[5.2] disable_ddl_transaction! def change add_index :users, :name, algorithm: :concurrently end endDon't: Use use
disable_ddl_transaction!
in a migration that changes the schema# 1 migration class AddNameToUsers < ActiveRecord::Migration[5.2] disable_ddl_transaction! def change add_column :users, :name add_index :users, :name, algorithm: :concurrently end end
For backfill tasks we use
ruby
to iterate over records and update the column value usingSecureRandom.uuid
.If there is a need to do it in pure SQL it can be done using a function as discussed in this SO post
Side note: some of this gist is outdated and needs to be updated for newer versions of postgres. Most notably regarding default values is that since PostgreSQL v11 adding a column with a static default value is no longer an unsafe operation: https://www.postgresql.org/docs/11/ddl-alter.html
This does not apply to the
gen_random_uuid()
of course though as that isvolatile
.