Database do's and don'ts

This is a generic guidline that applies mostly to Ruby on Rails and Postgres, but can be usefull for running any database migrations in CI.

Table of contents
Zero downtime migrations
Transactions
- Basic tenets
  - What to include in a transaction
    - Within the loop
    - Outside the loop
- Triggering a rollback
  - How to rescue from a failed transaction
Data modeling
Data access queries (selects)
Security
- UUIDs
Common gotchya's
- General
  - Long running queries (e.g. for data migrations)
- Strong migrations

Zero downtime migrations

Resources

Basic tenets

🚨 Don't break production 🚨
🙅🏾‍ Don't leave the database in a broken state
😭 Don't block fellow developers from deploying their code
🚧 Always remember that users will be reading and writing to the DB in between each state
🤦‍ Always remember that migrations are code, they cannot be run without loading the application, so the app deploys before migrations are run

Migration status

Do: Complete a migration successfully or ensure that it is completely undone upon failure.

Don't: Leave the database in a state different from when the failed migration started.

Note: when adding an index concurrently fails the index is left in a broken state and must be manually rolled back with a remove_index in the down migration

Locks

Do: Separate ACCESS EXCLUSIVE into their own transactions so they can complete quickly.

Don't: Use ACCESS EXCLUSIVE locks within long running transactions as they block reads/writes to the whole table.

Note: If unavoidable minimize work an exclusive lock has to do. e.g. Set column default in a separate migration to adding a column

Examples of operations that require an `ACCESS EXCLUSIVE` lock

ALTER TABLE
DROP TABLE
TRUNCATE
REINDEX
CLUSTER
VACUUM FULL

Rollbackable migrations

Always ensure your migrations can be rolled back

Do: Use from: and to: when adding defaults or use up and down methods if change cannot rollback

class AddSomeColumnToUsers < ActiveRecord::Migration[5.2]
  def change
    change_column_default :users, :some_column, from: 'old_default', to: 'new_default'
  end
end

Don't: Create a migration with a change method that cannot be rolled back

class AddSomeColumnToUsers < ActiveRecord::Migration[5.2]
  def change
    change_column_default :users, :some_column, :text, "default_value"
  end
end

Migration definitions that can be rolled back with the change method

Adding a column

Do: Add a column in it's own PR and fully deploy it before adding code that references the column

Don't: Add a column and try to use it in the same PR

This is because the app must load before the migration runs, so the code will be referencing a column that does not yet exist

Adding a column with a default value

https://github.com/ankane/strong_migrations#adding-a-column-with-a-default-value

https://blog.arkency.com/how-to-add-a-default-value-to-an-existing-column-in-a-rails-migration/

The why

Do: Add the default value after adding the column

class AddSomeColumnToUsers < ActiveRecord::Migration[5.2]
  def up
    add_column :users, :some_column, :text
    # The `from` and `to` attributes are not needed
    # because the down method removes the column
    change_column_default :users, :some_column, "default_value"
  end

  def down
    remove_column :users, :some_column
  end
end

Don't: Add the default in the add_column

class AddSomeColumnToUsers < ActiveRecord::Migration[5.2]
  def up
    add_column :users, :some_column, :text, default: "default_value"
  end
end

Exception: This is not necessary when you are creating a new table that is not referenced

Exception: Since PostgreSQL v11 adding a column with a static default value is no longer an unsafe operation: postgresql docs

adding a column with a constant default value no longer means that each row of the table needs to be updated when the ALTER TABLE statement is executed. Instead, the default value will be returned the next time the row is accessed, and applied when the table is rewritten, making the ALTER TABLE very fast even on large tables.

However, if the default value is volatile (e.g., clock_timestamp()) each row will need to be updated with the value calculated at the time ALTER TABLE is executed. To avoid a potentially lengthy update operation, particularly if you intend to fill the column with mostly nondefault values anyway, it may be preferable to add the column with no default, insert the correct values using UPDATE, and then add any desired default as described below.

Additional example for adding a uuid column

class AddDefaultToFooOnBar < ActiveRecord::Migration[5.2]
  def up
    add_column :bar, :foo, :uuid
    change_column_default :bar, :foo, 'gen_random_uuid()'
  end

  def down
    remove_column :bar, :foo
  end
end

Backfilling a uuid default

For backfill tasks we use ruby to iterate over records and update the column value using SecureRandom.uuid.

If there is a need to do it in pure SQL it can be done using a function as discussed in this SO post

Adding an index

https://github.com/ankane/strong_migrations#adding-an-index-postgres

https://thoughtbot.com/blog/how-to-create-postgres-indexes-concurrently-in

heroku guide to postgresql indexes

Do: Add the index concurrently in a separate migration with disable_ddl_transaction!

# Migration #1
class AddNameToUsers < ActiveRecord::Migration[5.2]
  def change
    add_column :users, :name, :string
  end
end

# Migration #2
class IndexUsersOnName < ActiveRecord::Migration[5.2]
  disable_ddl_transaction!
  def change
    add_index :users, :name, algorithm: :concurrently
  end
end

Don't: Add an index in the same transaction as adding the column

# 1 migration
class AddNameToUsers < ActiveRecord::Migration[5.2]
  def change
    add_column :users, :name, :string, index: true
  end
end

Adding a reference in rails includes an index by default

Do: Disable the automatic index added by rails for references by specifying index: false

# Migration #1
class AddAccountInfoToUsers < ActiveRecord::Migration[5.2]
  def change
    add_reference :users, :account_info, index: false
  end
end

# Migration #2
class IndexUsersOnAccountInfo < ActiveRecord::Migration[5.2]
  disable_ddl_transaction!
  def change
    add_index :users, :account_info_id, algorithm: :concurrently
  end
end

Don't: Allow references to be indexed when they are added

# 1 migration
class AddAccountInfoToUsers < ActiveRecord::Migration[5.2]
  def change
    add_reference :users, :account_info
  end
end

Adding a column with a default and a not null constraint and an index

In separate PRs

Add column, add default, add index, create backfill rake task
- Don't forget to update dev seeds so you're fellow developers can pull in your changes
Add not-null constraint (after running rake task)

# Migration 1:
class AddFooToBar < ActiveRecord::Migration[5.2]
  def up
    add_column :bar, :foo, :boolean # if it were a reference we would need: , index: false
    change_column_default :bar, :foo, false
  end

  def down
    remove_column :bar, :foo
  end
end

# Migration 2:
class IndexBarOnFoo < ActiveRecord::Migration[5.2]
  disable_ddl_transaction!
  def change
    add_index :bar, :foo, algorithm: :concurrently
  end
end

# 1 migration in the next pr
class AddNotNullToFoo < ActiveRecord::Migration[5.2]
  def change
   change_column_null :bar, :foo, false
  end
end

Process:

fully deploy PR1
run the rake task in both redama (staging) and production
confirm in both staging and production db that no nulls exist
deploy PR2

Removing a column

https://github.com/ankane/strong_migrations#removing-a-column

Do: Ignore the column in the ActiveRecord model, THEN create a migration to remove the column using safety_assured
# For Rails 5+
class User < ApplicationRecord
  self.ignored_columns = ["some_column"]
end
After the above is fully deployed
class RemoveSomeColumnFromUsers < ActiveRecord::Migration[5.2]
  def change
    safety_assured { remove_column :users, :some_column }
  end
end
Don't: Remove a column before fully deploying the ignored_columns code 🙏

⚠️ WARNING: Using safety_assured does not actually assure safety. It is telling strong migrations that you know what you're doing and disables the checks provided by the gem (see gotchyas at bottom)

Adding a json column

https://github.com/ankane/strong_migrations#adding-a-json-column-postgres

Do: Upgrade to Postgres 9.4+ and use jsonb

Don't: Use the json type as it causes problems for SELECT DISTINCT queries

If you are stuck on an old version of postgres you can use a json column like this

Modifying existing schema

Whenever you are modifying the existing schema, as opposed to adding or removing schema you must take the steps to ensure that the application is in a working state for each atomic deploy

The common procedure for this follows

Create the place for the new data (column or table)
Write to both old and new places in the application code (every single place that this could happen)
Backfill the missing data from the old place to the new place
Double check the data to ensure everything matches and nothing unexpected was missed
Convert the application to read from the new place
Stop writing to the old place
Remove the old place

In this particular example the steps are doubled in order to use the old namespace for the new schema

Many ways to do step 2 (double write)

temporarily override the setter
- this is nice because it makes for clean diffs and least amount of change in the code to remove
- watch out for anytime you may update the column that bypasses the default setter
  - in Rails this includes: [:write_attribute, :update_column, :update_columns, :update_all, :[]=]
- be careful if the old schema is written to the db differently than the new schema
  - ex: if changing from updating a single record to versioned, append only model
  - ex: when going from a has_one to a belongs_to you go from updating many fields to only updating a foreign key
  - ex: this will not work when changing to different tables/models
meticulously comb through every place that data could be written
- obvious downside is that it could be a lot of different places
- you could miss a spot which is why it's always good to check that the data matches before switching the reads to the new way (step 4)
- sometimes it may be the only option to gain flexibility you don't have with overriding setters

Data migrations (backfilling data)

Do: Modify data in an async task runner (such as a rake task in rails)

Don't: Modify data in a transactional database migration

If a rake task fails it doesn't matter, but if a migration fails it must be undone manually in order to resume production deploys.

For complicated stuff we usually add a dry run to the rake task.
It can also take a couple hours and it’s not a big deal.

Transactions

Basic tenets

Never spawn asynchronous processes (sidekiq jobs) in transactions because when the worker begins work the passed id could not exist, or worse belong to a different record that was created after.
Do not publish the success of an event inside of a transaction because we don't really know if it was a success yet. You must collect the successes/failures and publish events after the transaction. EventStore.batched can help with this.
Be conscious of transactions and only use them when operations need to be atomic. If it's ok to do one thing and fail the other without having broken data then do it. For example, something like notification creation should always be idempotent (can't create the same notification twice) so in this case it would be fine to create the ones that succeed, then try to recreate the failures without duplicating notifications.
Nested active record transactions do not behave as you might expect. The ActiveRecord::Rollback error raised by the inner transaction is does not trigger a rollback in the outer transaction. It is best to avoid nested transactions for logical clarity but this unexpected behavior can be safely avoided by using transaction(joinable: false, requires_new: true)
- joinable: false tells all inner transactions to use a real or simulated nested transaction
- requires_new: true on the inner transaction prevents joining to the outer
- mysql and postgres don't actually support nested transactions so save_points are used more info
Remember that any single activerecord write is already wrapped in a transaction.

What to include in a transaction

Within the loop

Most cases of a rake task should fit this paradigm. We are updating a bunch of records, but we are only selecting the records that have not been updated, so running the task is idempotent and we don't want to rollback all the changes if one record fails to update. We only want to rollback if dependent changes within the block fail.

LegacyPost.where.not(id: Post.pluck(:legacy_id)).find_each do |legacy_post|
  Post.transaction do
    new_post_attrs = legacy_post.attributes.merge(id: nil, legacy_id: legacy_post.id)
    post = Post.create!(new_post_attrs)
    legacy_post.comments.each do |comment|
      comment.update!(post: post)
    end
  end
end

Outside the loop

This is less common and should only be done if failing to update 1 record makes all the records invalid or if it is part of a transactional API call etc. since large (long running) transactions can cause other lock requests to build up.

ActiveRecord::Base.transaction do
  batch = PaymentBatch.create!
  Payments.ready_to_process.find_each do |payment|
    BatchPaymentProcess.create!(
      payment: payment,
      batch: batch
    )
  end
  batch.update!(
    external_payment_process_id: ExternalPaymentProcessor::BatchProcess.create(batch).id
  )
end

Triggering a rollback

Do: Use ActiveRecord bang (!) methods such as save!, update!, and delete! as these will raise an exception and trigger a rollback.

Don't: Expect a failed save, update, or delete (without the bang '!') to trigger a rollback as these do not raise an exception.

How to rescue from a failed transaction

This should only be done if you need to do something special with the failed record(s) or to use the exception for logging etc.

Do: Rescue ActiveRecord::RecordInvalid outside of the transaction

def do_something_to_2_records(job)
  Job.transaction do
   job.update!(title: 'cool')
   job.user.update!(name: 'coolest')
  end
rescue ActiveRecord::RecordInvalid => exception
  # the transaction has been rolled back
  # do something with exception here
end

Don't: Rescue ActiveRecord:Rollback and re-raise it

Data modeling

Resources

Common database design flaws

Allowing nulls in boolean columns

Do: 💅🏽 use NOT NULL constraints and set defaults for boolean columns

Don't: 🙅🏾‍ allow NULL and use 3 value boolean logic

If you are allowing nulls in a database boolean column to make form serialization easier, you are making the wrong trade-off. The database schema does not bend the knee to client level logic.

Why?:

Multi column indexes

compound index on (a,b) performs similarly as index on (a) for queries that only do comparisons on a
compound index on (a,b) is not used for queries that only do comparisons on b
compound index performs better that individual indexes on (a) and (b) for queries that do comparisons on both columns
postgres is smart enough to use 2 single column indexes but performance has been shown to be much better for the compound index:
- https://www.postgresql.org/docs/current/indexes-bitmap-scans.html
- https://dataschool.com/sql-optimization/multicolumn-indexes/

The best recommendation I have found actually comes directly from the postgres docs:

if your workload includes a mix of queries that sometimes involve only column x, sometimes only column y, and sometimes both columns, you might choose to create two separate indexes on x and y, relying on index combination to process the queries that use both columns. You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone. The last alternative is to create all three indexes, but this is probably only reasonable if the table is searched much more often than it is updated and all three types of query are common. If one of the types of query is much less common than the others, you'd probably settle for creating just the two indexes that best match the common types.

For the case of a polymorphic relationship, say resource_id and resource_type:

A combination of the multicolumn index and a separate index on y would serve reasonably well.

So an index on:

[resource_id, resource_type] order matters
[resource_type]

Array column types - representing multiple values in a single field

Do: use a join table relationship or a jsonb column

Don't: use an array type column

Most of the time if you are reaching for an array type column you would be better off normalizing the relationship into another table. If you think you absolutely must have an array type column, just use a jsonb column instead. 🤷🏽‍

A `has_one` association

If a has_one association is being used to define a relationship to a relatively static collection of records, this is often better suited as a belongs_to association.

Do: Reverse a has_one association to a belongs_to association and a has_many on the reflected model

# columns: [:make, :model, :color_id]
class Car < ApplicationRecord
   # A car belongs to a color
   # yes it sounds funny but that's ok
   belongs_to :color
end

# columns: [:name, :hexcode, :red_value, :blue_value, :green_value]
class Color < ApplicationRecord
   has_many :cars
end

Don't: Create new has_one associations since it leads to bad data access models and event logging

# columns: [:make, :model]
class Car < ApplicationRecord
   has_one :color
end

# columns: [:name, :car_id, :hexcode, :red_value, :blue_value, :green_value]
class Color < ApplicationRecord
   belongs_to :car
end

Commonly this is essentially a more complicated enum often called an allowed value table (AVT)
A humorous article on why
Benefits:
- Colors table is much smaller since a single color is not repeated for every car
- Never need to update the colors table when a car's color changes
- Updating the color_id of the car is more semantic to the car's color changing

If you already have a has_one, you can convert it to a belongs_to. You essentially need to store the 'colors' in a new table and follow the steps to modify schema

Data access queries (selects)

Resources

Filter on a subquery

Do: Try doing an inner join to the subquery so the planner can more easily turn it into a CTE.

select posts.id, sum(comments.points)
from posts
join comments on posts.id = comments.post_id
join (select user.id from users where user.something = 'something') as real_users on posts.user_id = real_users.id

Don't: Filter the result set of an outer query based on the result of an expensive subquery in the where clause.

select posts.id, sum(comments.points)
from posts
join comments on posts.id = comments.post_id
where posts.user_id in (
  select user.id from users where user.something = 'something'
)

The above example is a bit contrived since the query is so simple, but as the subquery becomes more complex, the cost of performing it for every row of the outer result set becomes greater.

Counting is hard - aggregation functions

Counting and other aggregate methods are often the most expensive parts of a query

count(DISTINCT n) is especially slow and can often be avoided by better filtering the initial query
Sometimes it can be beneficial to cache aggregate computations in another table
- this can lead to much more complexity in the application so weigh the costs against your performance needs
- consider how often you will need to update these computations vs how often you will access the values

Do: Use count(*) as it is essentially psql syntax for count().

Don't: Mess about with count(1) or count(id) as these can be slightly less efficient.

Security

UUIDs

https://security.stackexchange.com/questions/93902/is-postgress-uuid-generate-v4-securely-random

Do: use gen_random_uuid() from the pgcrypto extension

Don't: use uuid_generate_v4() from the uuid-ossp extension

If you need to create cryptographically secure UUIDs (or ensure real randomness), do not use the uuid_generate_v4() function provided by uuid-ossp. That function falls back to using a pseudo random number generator (pRNG) when the cryptographically secure random number generator (RNG) fails. Instead, use the gen_random_uuid() function from pgcrypto. This function raises an exception when true RNG fails instead of falling back to a less secure implementation.

Common gotchya's

General

Long running queries (e.g. for data migrations)

It is always good to set an aggressive db timeout for main application access. This means that when you are say, updating a column for an entire table to a single default value, you must do this in batches so that it does not trigger a timeout.

Do: Use MyModel.in_batches.update_all(new_field: 'default_value')

Don't: Use MyModel.update_all(new_field: 'default_value') (without in_batches)

Strong migrations

Removing unignored columns

Examples:

🤦🏿‍ Often times someone new to using strong migrations, will write a migration to remove a column in the normal rails way. Then when running the migration, they will see the strong migrations error with a link to the github and a note to wrap the migration in safety_assured { ... }. Then simply wrap their migration with the safety assured block, see that in runs successfully, and attempt to deploy. This is very dangerous because now when the production app runs the migration, it will remove that column, but the existing running application has cached it's active record attribute methods on each model so you will get a lot of column does not exist errors that 500 the application until the column is added back or all application servers are restarted.
🥶 Writing 2 prs at the same time and not waiting for the column ignore to fully deploy before merging the column removal

Do: Ensure that the PR to ignore the column in the model is completely deployed before merging the PR to remove the column from the DB.

Don't: Merge the PR to remove the column from the DB before the deploy to ignore the column in the model is completely finished.

Disabling transactions

Do: Use disable_ddl_transaction! at the top of the migration class.

Don't: Use commit_db_transaction within a change, up, or down method

This is because using commit_db_transaction can lead to migration steps that do not get rolled back which breaks the first tenet of safe migrations

Separating non-transactional migrations

Do: Only perform a single concurrent task (such as adding an index concurrently) in a migration with disable_ddl_transaction!

# Migration #1
class AddNameToUsers < ActiveRecord::Migration[5.2]
  def change
    add_column :users, :name
  end
end

# Migration #2
class IndexUsersOnName < ActiveRecord::Migration[5.2]
  disable_ddl_transaction!
  def change
    add_index :users, :name, algorithm: :concurrently
  end
end

Don't: Use use disable_ddl_transaction! in a migration that changes the schema

# 1 migration
class AddNameToUsers < ActiveRecord::Migration[5.2]
  disable_ddl_transaction!
  def change
    add_column :users, :name
    add_index :users, :name, algorithm: :concurrently
  end
end

adamrdavid/db_guidelines.md

Database do's and don'ts

Table of contents

Zero downtime migrations

Resources

Basic tenets

Migration status

Locks

Examples of operations that require an ACCESS EXCLUSIVE lock

Rollbackable migrations

Adding a column

Adding a column with a default value

Additional example for adding a uuid column

Backfilling a uuid default

Adding an index

Adding a reference in rails includes an index by default

Adding a column with a default and a not null constraint and an index

In separate PRs

Removing a column

Adding a json column

Modifying existing schema

The common procedure for this follows

Many ways to do step 2 (double write)

Data migrations (backfilling data)

Transactions

Basic tenets

What to include in a transaction

Within the loop

Outside the loop

Triggering a rollback

How to rescue from a failed transaction

Data modeling

Resources

Allowing nulls in boolean columns

Multi column indexes

Array column types - representing multiple values in a single field

A has_one association

Data access queries (selects)

Resources

Filter on a subquery

Counting is hard - aggregation functions

Security

UUIDs

Common gotchya's

General

Long running queries (e.g. for data migrations)

Strong migrations

Removing unignored columns

Disabling transactions

Separating non-transactional migrations

adamrdavid commented Nov 20, 2020

Examples of operations that require an `ACCESS EXCLUSIVE` lock

A `has_one` association