-
(Maybe/Probably) Suspend boost reporting job
-
ut site in maintenance mode(?) -
Manual run of boost etl job
-
Switch Production k8s endpoints to
boost02
- external-reporting-compute-database
- external-reporting-database
- PR in
infrastructure
repo https://github.com/watermelonexpress/infrastructure/pull/673 - Deploy ☝️
Note: switching endpoints before promoting the standby database seems like the best way to prevent any split-brain or data loss in the HA cluster, but may result in some momentary ugly airbrakes. Obviously the goal is to switch/promote/stop old master as close to simultaneously as we can manage.
Q: Should this be a PR into
infrastructure:production
branch, or can we deploy a separate branch into Production (we do intend to switch back, after all)? -
Roll pgbouncer pods in k8s
-
Switchover
boost01:5432
primary instance toboost02:5432
by runningrepmgr standby switchover
onboost02
-
Confirm logical replication subscription in session db (refresh or rebuild as needed)
Logical subscription in
benchprep_reporting_api_production
is pointed at db02, and in theory will pickup where it left off when we promote the standby, but my confidence in that is limited
-
Order new 1.9TB SED SSD for
boost01
-
Wait for IBM to install the disk
-
Stop postgres on
boost01:6432
-
Copy
/var/lib/pgsql/10/data/*.conf
to/tmp/5432/
-
Copy
/mount/pgsql/10/wmx_rails_api/*conf
to/tmp/6432/
-
Create new replica from base backup of
db02
on/mount/pgsql/10/wmx_rails_api
-
Copy
*.conf
files from/tmp/6432/
to new data directory -
Start postgres
boost01:6432
as streaming replica -
Create replica from base backup of
boost02:5432
on/var/lib/pgsql/10/data
-
Copy
*.conf
files from/tmp/5432/
to/var/lib/pgsql/10/data
-
Start postgres
boost01:5432
in standby mode
- Put site in maintenance mode(?)
- Switch Production k8s endpoints to
boost01
- external-reporting-compute-database
- external-reporting-database
- Roll pgbouncer pods
- Stop postgres on
boost02:5432
- Promote
boost01:5432
from standby to master - Out of maintenance mode
- Confirm FDW config and connection from
boost01:5432/production_boost_reporting
towmx_rails_api_production
on 6432 replica. - Refresh or rebuild logical replication subscription in session db
- Re-enable boost etl cron job