- starts here: https://github.com/dasl-/vitess/blob/managed-tablet/go/vt/vtctl/reparent.go#L98
- DemoteMaster: https://github.com/dasl-/vitess/blob/managed-tablet/go/vt/wrangler/reparent.go#L423
- Set serving = false in vt topo: https://github.com/dasl-/vitess/blob/managed-tablet/go/vt/vttablet/tabletmanager/rpc_replication.go#L305
- Set RO
- FLUSH TABLES WITH READ LOCK (why? this wait for long running selects) https://github.com/dasl-/vitess/blob/managed-tablet/go/vt/mysqlctl/reparent.go#L97-L104
- UNLOCK TABLES
- SELECT @@GLOBAL.gtid_executed
- PromoteSlaveWhenCaughtUp https://github.com/dasl-/vitess/blob/managed-tablet/go/vt/wrangler/reparent.go#L435 Note that the timeout for this is set here: https://github.com/dasl-/vitess/blob/managed-tablet/go/vt/wrangler/reparent.go#L428 By default it is 30s: https://github.com/dasl-/vitess/blob/managed-tablet/go/vt/topo/locks.go#L53
- Wait for replication to catch up: SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('%s', %v) . The timeout is https://github.com/dasl-/vitess/blob/managed-tablet/go/mysql/flavor_mysql.go#L134
- SET @@global.read_only = false
- Change topo type to master
- agent.refreshTablet https://github.com/dasl-/vitess/blob/master/go/vt/vttablet/tabletmanager/rpc_replication.go#L459
- agent.updateState https://github.com/dasl-/vitess/blob/master/go/vt/vttablet/tabletmanager/state_change.go#L157
- agent.broadcastHealth (let vtgate know we're serving) https://github.com/dasl-/vitess/blob/master/go/vt/vttablet/tabletmanager/state_change.go#L341
- Insert row into the reparent journal table on the new master. https://github.com/dasl-/vitess/blob/f23777db959c7bb6be128d091e81f47e04bbc2a8/go/vt/wrangler/reparent.go#L473
- reparent all replicas to the new master. https://github.com/dasl-/vitess/blob/managed-tablet/go/vt/wrangler/reparent.go#L460
- Run
CHANGE MASTER TO ...
query. https://github.com/dasl-/vitess/blob/f23777db959c7bb6be128d091e81f47e04bbc2a8/go/vt/vttablet/tabletmanager/rpc_replication.go#L540 - Wait for row inserted into master's reparent journal to replicate. https://github.com/dasl-/vitess/blob/f23777db959c7bb6be128d091e81f47e04bbc2a8/go/vt/vttablet/tabletmanager/rpc_replication.go#L563
- Run
- UpdateShardFields in topology https://github.com/dasl-/vitess/blob/master/go/vt/wrangler/reparent.go#L502
Docs say: -wait_slave_timeout duration time to wait for slaves to catch up in reparenting (default 30s)
waitSlaveTimeout is used for
- searching for best new master candidate (which has executed most transactions)
- reparenting all replicas to the new master after the new master has been enabled.
I'd expect it to also be used for timing out PromoteSlaveWhenCaughtUp. PR to do so has been merged: https://github.com/vitessio/vitess/commit/baa15c2571257ed2bd00cc9f38f0d748eeeaa6d2
We had an incident recently using PlannedReparentShard: https://github.etsycorp.com/gist/dleibovic/b666ad2c0d3a1f9ec612f413c4adfa18