A recent issue was caused by a PR promoted to prod before the necessary Heroku Connect changes were made on prod (they were correctly made on test). It's #82 in the RCA log
Tim flagged that this has happened several times in the past (though not recorded in RCA log). Can we prevent it in future?
As per Joel Test - 2. Can you make a build in one step?, "If the process takes any more than one step, it is prone to errors".
Manually editing Heroku connect config -> deploy involves more than one step -> prone to errors.
We can reduce risk by:
Technical approach: eg. make HC changes work like DB migrations, etc.
Human approach: eg. introduce process to encourage / remind devs to not make this mistake.
a) halt deploy on config mismatch: On deploy, use psql to automatically compare heroku schema on prod to test. If there's a mismatch, halt the deploy.
Problems: prone to false negatives when test neccessarily deviates from prod. Hard to implement well. Probably not worth it cost-benefit wise.
b) make config changes work like migrations: Use importing config from CLI to write config changes as part of the pull request, then execute the config changes on deploy ala DB migrations.
Problems: Once the config is imported, you have to wait for rows to sync, which will be hard / impossible programmatically. Surfacing errors in config or on sync is easy through UI but will be hard / impossible programmatically.
c) any other ideas?
a) add "check Heroku Config" to PR templates:
Problems: Probably won't work since the PR template is checked at merge-time not deploy-time. Also, if we allow that level of granularity in PR templates we'll soon have a checklist of a dozen similar risks that devs just ignore.
b) always record even small Heroku Connect blips in RCA: Will just mean we're better equipped to assess patterns and address recurring issues further down the track.
c) any other ideas?
Or, we could do nothing and just "try to be more careful"?
I have looked at the ease of automating deployment changes with HC and concluded that it's poorly supported/not worth investing. Povoconnect might be easier to integrate into our deployment strategy.
So looking at Human approaches, what I find works well (for newly mapped columns) is to treat the HC part as separate change and complete that before any code is merged to master. i.e. add the mapping in test, verify SF data and sync is good, then add the mapping in prod and verify the sync. At this point you are ready to open PR which depends on the new SF column.