How to not make mistakes with Heroku Connect

A recent issue was caused by a PR promoted to prod before the necessary Heroku Connect changes were made on prod (they were correctly made on test). It's #82 in the RCA log

Tim flagged that this has happened several times in the past (though not recorded in RCA log). Can we prevent it in future?

Why it's a hard problem

As per Joel Test - 2. Can you make a build in one step?, "If the process takes any more than one step, it is prone to errors".

Manually editing Heroku connect config -> deploy involves more than one step -> prone to errors.

2 approaches to reducing risk

We can reduce risk by:

Technical approach: eg. make HC changes work like DB migrations, etc.

Human approach: eg. introduce process to encourage / remind devs to not make this mistake.

1) Technical Approaches

a) halt deploy on config mismatch: On deploy, use psql to automatically compare heroku schema on prod to test. If there's a mismatch, halt the deploy.
Problems: prone to false negatives when test neccessarily deviates from prod. Hard to implement well. Probably not worth it cost-benefit wise.

b) make config changes work like migrations: Use importing config from CLI to write config changes as part of the pull request, then execute the config changes on deploy ala DB migrations.
Problems: Once the config is imported, you have to wait for rows to sync, which will be hard / impossible programmatically. Surfacing errors in config or on sync is easy through UI but will be hard / impossible programmatically.

c) any other ideas?

2) Human Approaches

a) add "check Heroku Config" to PR templates:
Problems: Probably won't work since the PR template is checked at merge-time not deploy-time. Also, if we allow that level of granularity in PR templates we'll soon have a checklist of a dozen similar risks that devs just ignore.

b) always record even small Heroku Connect blips in RCA: Will just mean we're better equipped to assess patterns and address recurring issues further down the track.

c) any other ideas?

Or, we could do nothing and just "try to be more careful"?

joshuapaling/heroku-connect.md

How to not make mistakes with Heroku Connect

Why it's a hard problem

2 approaches to reducing risk

1) Technical Approaches

2) Human Approaches

innomatics commented Jun 11, 2020

Uh oh!

mynameisrufus commented Jun 14, 2020

Uh oh!

innomatics commented Jun 14, 2020

Uh oh!

joshuapaling commented Jun 15, 2020

Uh oh!