master-q/mysql.log

How to shape failover over MySQL

Environment for testing

Server vagrant-db1

Hostname: vagrant-db1
OS: CentOS 5.7
eth0: 10.0.2.15 (for internet service)
eth1: 192.168.179.6 (for interconnect)

Server vagrant-db2

Hostname: vagrant-db2
OS: CentOS 5.7
eth0: 10.0.2.16 (for internet service)
eth1: 192.168.179.7 (for interconnect)

Server vagrant-web1

Hostname: vagrant-web1
OS: CentOS 5.7
eth0: 10.0.2.16 (for internet service)
eth1: 192.168.179.8 (for interconnect)

VIP

For connection by MySQL.

192.168.179.100

Specification

Normal operation

The vagrant-db1 is DB master, the vagrant-db2 is DB slave, and the VIP points vagrant-db1.
Or the vagrant-db2 is DB master, the vagrant-db1 is DB slave, and the VIP points vagrant-db2.
MySQL client use the VIP to access MySQL server.

When failure occurs

Case A: DB master has failure

DB slave will become DB master, and the VIP points the DB master.

To fix it, you should setup a CentOS instance from scratch, and let it become DB slave.

Case B: DB slave has failure

DB master knows that DB slave shutdown.

To fix it, you should setup a CentOS instance from scratch, and let it become DB slave.

Setup

Install software into vagrant-web1

Install following software for stress testing.

vagrant-web1$ sudo yum install mysql-bench perl-DBD-MySQL

Install software into vagrant-db1

Install following software for failover.

heartbeat: 3.0.5
pacemaker: 1.0.13
mysql-server: 5.0.95

vagrant-db1$ sudo yum install mysql-server which
vagrant-db1$ cd /tmp
vagrant-db1$ wget 'http://osdn.jp/frs/redir.php?m=iij&f=%2Flinux-ha%2F61792%2Fpacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz'
vagrant-db1$ tar xfz pacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz
vagrant-db1$ cd pacemaker-1.0.13-2.1.el5.x86_64.repo
vagrant-db1$ sudo yum -c pacemaker.repo install heartbeat.x86_64 pacemaker.x86_64

Setup MySQL on vagrant-db1

Setup MySQL.

vagrant-db1$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=1
...
vagrant-db1$ sudo /sbin/service mysqld start
vagrant-db1$ mysql -u root -p
mysql> DELETE FROM mysql.user WHERE User = '';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'slavepass';
mysql> GRANT SUPER,REPLICATION SLAVE,REPLICATION CLIENT,PROCESS ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'slavepass';
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db1$ sudo /sbin/chkconfig mysqld on

Setup Heartbeat on vagrant-db1

Setup Heartbeat.

vagrant-db1$ sudo vi /etc/ha.d/ha.cf
pacemaker on
logfacility local1

debug 0
udpport 694

keepalive 2
warntime 20
deadtime 24
initdead 48

bcast eth1

node vagrant-db1
node vagrant-db2
vagrant-db1$ sudo vi /etc/ha.d/authkeys
auth 1
1 sha1 centabcdefg
vagrant-db1$ sudo chown root:root /etc/ha.d/authkeys
vagrant-db1$ sudo chmod 600 /etc/ha.d/authkeys
vagrant-db1$ sudo vi /etc/syslog.conf
...
*.info;mail.none;authpriv.none;cron.none;local1.none    /var/log/messages
...
# Save pacemaker log
local1.*                                                /var/log/ha-log

Boot Pacemaker.

vagrant-db1$ sudo /sbin/service heartbeat start

Install software into vagrant-db2

Install following software for failover.

vagrant-db2$ sudo yum install mysql-server which
vagrant-db2$ cd /tmp
vagrant-db2$ wget 'http://osdn.jp/frs/redir.php?m=iij&f=%2Flinux-ha%2F61792%2Fpacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz'
vagrant-db2$ tar xfz pacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz
vagrant-db2$ cd pacemaker-1.0.13-2.1.el5.x86_64.repo
vagrant-db2$ sudo yum -c pacemaker.repo install heartbeat.x86_64 pacemaker.x86_64

Setup MySQL on vagrant-db2

Setup MySQL.

vagrant-db2$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=2
...
vagrant-db2$ sudo /sbin/service mysqld start
vagrant-db2$ mysql -u root -p
mysql> DELETE FROM mysql.user WHERE User = '';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'slavepass';
mysql> GRANT SUPER,REPLICATION SLAVE,REPLICATION CLIENT,PROCESS ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'slavepass';
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db2$ sudo /sbin/chkconfig mysqld on

Setup Heartbeat on vagrant-db2

Setup Heartbeat.

vagrant-db2$ sudo vi /etc/ha.d/ha.cf
pacemaker on
logfacility local1

debug 0
udpport 694

keepalive 2
warntime 20
deadtime 24
initdead 48

bcast eth1

node vagrant-db1
node vagrant-db2
vagrant-db2$ sudo vi /etc/ha.d/authkeys
auth 1
1 sha1 centabcdefg
vagrant-db2$ sudo chown root:root /etc/ha.d/authkeys
vagrant-db2$ sudo chmod 600 /etc/ha.d/authkeys
vagrant-db2$ sudo vi /etc/syslog.conf
...
*.info;mail.none;authpriv.none;cron.none;local1.none    /var/log/messages
...
# Save pacemaker log
local1.*                                                /var/log/ha-log

Boot Pacemaker.

vagrant-db2$ sudo /sbin/service heartbeat start

Check status of Pacemaker

Please check Pacemaker status using crm_mon command. It needs about 1 minute by everything online.

vagrant-db2$ sudo /usr/sbin/crm_mon
============
Last updated: Fri Jul 10 18:40:44 2015
Stack: Heartbeat
Current DC: vagrant-db2 (ca03e33c-82bb-4da9-bf64-bba48df33141) - partition with quorum
Version: 1.0.13-a83fae5
2 Nodes configured, unknown expected votes
0 Resources configured.
============

Online: [ vagrant-db1 vagrant-db2 ]

Setup Pacemaker on vagrant-db1

Setup Pacemaker using crm command.

vagrant-db1$ sudo bash
vagrant-db1# export PATH=$PATH:/usr/sbin
vagrant-db1# crm node standby vagrant-db2
vagrant-db1# crm configure
crm(live)configure# primitive vip_192.168.179.100 ocf:heartbeat:IPaddr2 params ip="192.168.179.100" cidr_netmask="24" nic="eth1"
crm(live)configure# property no-quorum-policy="ignore" stonith-enabled="false"
crm(live)configure# node vagrant-db1
crm(live)configure# node vagrant-db2
crm(live)configure# commit
crm(live)configure# quit
vagrant-db1# crm
crm(live)# cib new mysql_repl
crm(mysql_repl)# configure primitive mysql ocf:heartbeat:mysql params binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid replication_user=repl replication_passwd=slavepass op start interval=0 timeout=120s op stop interval=0 timeout=120s op monitor interval=20s timeout=30s op monitor interval=10s role=Master timeout=30s op monitor interval=30s role=Slave timeout=30s op promote interval=0 timeout=120s op demote interval=0 timeout=120s op notify interval=0 timeout=90s
crm(mysql_repl)# cib commit mysql_repl
crm(mysql_repl)# quit
vagrant-db1# crm configure ms mysql-clone mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
vagrant-db1# crm configure colocation vip_on_mysql inf: vip_192.168.179.100 mysql-clone:Master
vagrant-db1# crm configure order vip_after_mysql inf: mysql-clone:promote vip_192.168.179.100:start
vagrant-db1# crm node online vagrant-db2

Here, we are ready for replication of MySQL.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Testing replication

Create a table on vagrant-db1

First, try to create table at vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> create database example;
mysql> create table example.dummy (`id` varchar(10));
mysql> show tables in example;
mysql> quit;

Check the table and create another table on vagrant-db2

Please check the table at vagrant-db2.

vagrant-db2# mysql -u root -p
mysql> show tables in example;
mysql> quit;

Switch database master into vagrant-db2. And try to create table at vagrant-db2.

vagrant-db2# crm node standby vagrant-db1
vagrant-db2# sleep 10
vagrant-db2# crm node online vagrant-db1
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]
vagrant-db2# mysql -u root -p
mysql> create database example2;
mysql> create table example2.dummy (`id` varchar(100));
mysql> show tables in example2;
mysql> quit;

Check the table on vagrant-db1

Please check the table at vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> show tables in example2;
mysql> quit;

Testing on test environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Test A: Vagrant-db1 halts

To emulate the test case, simply shutdown vagrant-db1.

vagrant-db1# /sbin/shutdown -h now

After the testing, vagrant-db2 should be master.

vagrant-db2# crm_mon
Online: [ vagrant-db2 ]
OFFLINE: [ vagrant-db1 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Stopped: [ mysql:0 ]

To fix it, you should setup a CentOS instance from scratch, and do following.

Install software into vagrant-db1
Setup Heartbeat on vagrant-db1

Get MySQL database dump on vagrant-db2.

vagrant-db2# mysql -u root -p
mysql> FLUSH TABLES WITH READ LOCK;
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000012 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db2# mysqldump -u root -p -x --all-databases --lock-all-tables > /vagrant_data/db.dump
vagrant-db2# mysqldump -u root -p -x --allow-keywords --lock-all-tables mysql > /vagrant_data/dbuser.dump
vagrant-db2# mysql -u root -p
mysql> UNLOCK TABLES;

And repair MySQL database on vagrant-db1.

vagrant-db1$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=1
...
vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/chkconfig mysqld on
vagrant-db1# mysql -u root -p < /vagrant_data/db.dump
vagrant-db1# mysql -u root -p mysql < /vagrant_data/dbuser.dump
vagrant-db1# mysql -u root -p
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db1# /sbin/service mysqld stop

Restart Pacemaker on vagrant-db1.

vagrant-db1# /sbin/service heartbeat restart
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]
OFFLINE: [ vagrant-db1 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

vagrant-db1# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Set master status into vagrant-db2.

vagrant-db1# mysql -u root -p
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000004 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db1# crm node standby vagrant-db2
vagrant-db1# sleep 10
vagrant-db1# crm node online vagrant-db2
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

vagrant-db2# mysql -u root -p
mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000004', MASTER_LOG_POS = 98;
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test B: Vagrant-db1 is temporarily disconnected

To emulate the test case, simply reboot vagrant-db1.

vagrant-db1# reboot

After the testing, vagrant-db2 should be master and vagrant-db1 should be slave.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

Pacemaker automatically fix it.

Test C: MySQL server on vagrant-db1 is killed

To emulate the test case, simply stop MySQL server.

vagrant-db1# /sbin/service mysqld stop

After the testing, vagrant-db2 should be slave with some errors.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Stopped: [ mysql:0 ]

Failed actions:
    mysql:0_monitor_10000 (node=vagrant-db1, call=21, rc=7, status=complete): not running
    mysql:0_demote_0 (node=vagrant-db1, call=64, rc=7, status=complete): not running
    mysql:0_start_0 (node=vagrant-db1, call=68, rc=1, status=complete): unknown error

To fix it, you should run MySQL and restart Pacemaker at vagrant-db1.

vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/service heartbeat restart
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

Test D: Vagrant-db2 halts

To emulate the test case, simply shutdown vagrant-db2.

vagrant-db2# /sbin/shutdown -h now

After the testing, vagrant-db1 should be master.

vagrant-db1# crm_mon
Online: [ vagrant-db1 ]
OFFLINE: [ vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Stopped: [ mysql:1 ]

To fix it, you should setup a CentOS instance from scratch, and do following.

Install software into vagrant-db2
Setup Heartbeat on vagrant-db2

Get MySQL database dump on vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> FLUSH TABLES WITH READ LOCK;
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000011 |      270 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db1# mysqldump -u root -p -x --all-databases --lock-all-tables > /vagrant_data/db.dump
vagrant-db1# mysqldump -u root -p -x --allow-keywords --lock-all-tables mysql > /vagrant_data/dbuser.dump
vagrant-db1# mysql -u root -p
mysql> UNLOCK TABLES;

And repair MySQL database on vagrant-db2.

vagrant-db2$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=2
...
vagrant-db2# /sbin/service mysqld start
vagrant-db2# /sbin/chkconfig mysqld on
vagrant-db2# mysql -u root -p < /vagrant_data/db.dump
vagrant-db2# mysql -u root -p mysql < /vagrant_data/dbuser.dump
vagrant-db2# mysql -u root -p
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db2# /sbin/service mysqld stop

Restart Pacemaker on vagrant-db2.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]
OFFLINE: [ vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

vagrant-db2# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Set master status into vagrant-db1.

vagrant-db2# mysql -u root -p
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000004 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db2# crm node standby vagrant-db1
vagrant-db2# sleep 10
vagrant-db2# crm node online vagrant-db1
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

vagrant-db1# mysql -u root -p
mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000004', MASTER_LOG_POS = 98;
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test E: Vagrant-db2 is temporarily disconnected

To emulate the test case, simply reboot vagrant-db2.

vagrant-db2# reboot

After the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Pacemaker automatically fix it.

Test F: MySQL server on vagrant-db2 is killed

To emulate the test case, simply stop MySQL server.

vagrant-db2# /sbin/service mysqld stop

After the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Pacemaker automatically fix it.

Stress testing on test environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

How to run mysql-bench

Add "sysbench" user and "test" database on MySQL.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]
vagrant-db1# mysql -u root -p
mysql> GRANT ALL PRIVILEGES ON *.* TO 'sysbench'@"%" identified by 'sysbenchpass';
mysql> FLUSH PRIVILEGES;
mysql> CREATE DATABASE test;

Run mysql-bench at vagrant-web1.

vagrant-web1# cd /usr/share/sql-bench
vagrant-web1# while true; do sudo ./test-insert --server=mysql --host=192.168.179.100 --user=sysbench --password=sysbenchpass; done

How to check SLAVE STATUS

Switch database master into vagrant-db2. And check SLAVE STATUS as Yes at vagrant-db1.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]
vagrant-db1# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Switch database master into vagrant-db1. And check SLAVE STATUS as Yes at vagrant-db2.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]
vagrant-db2# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test A': Vagrant-db1 halts, with stress

Do "Test A: Vagrant-db1 halts" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Test B': Vagrant-db1 is temporarily disconnected, with stress

Do "Test B: Vagrant-db1 is temporarily disconnected" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Test C': MySQL server on vagrant-db1 is killed, with stress

Do "Test C: MySQL server on vagrant-db1 is killed" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Testing on production environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave. Following testing needs a person who manually updates user-setting on web browser.

Test A'': Vagrant-db1 halts while using product

Do "Test A: Vagrant-db1 halts" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test B'': Vagrant-db1 is temporarily disconnected while using product

Do "Test B: Vagrant-db1 is temporarily disconnected" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test C'': MySQL server on vagrant-db1 is killed while using product

Do "Test C: MySQL server on vagrant-db1 is killed" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test G'': Web servers are temporarily disconnected while using product

Reboot 1st web server and 2nd web server while user-setting is updated continuously. Finally, check SLAVE STATUS.

Trouble shooting

Error "mysql:0_start_0 (node=vagrant-db1 ...): unknown error" occurs on crm_mon

To fix it, do following on the server.

vagrant-db1# /sbin/service heartbeat stop
vagrant-db1# /sbin/chkconfig mysqld on
vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/service heartbeat start

Error "Slave I/O thread: error connecting to master" occurs on mysql.log

vagrant-db2# less /var/log/mysqld.log
150804  8:55:17 [ERROR] Slave I/O thread: error connecting to master 'repl@vagrant-db1:3306': Error: 'Access denied for user 'repl'@'vagrant-db2' (using password: YES)'  errno: 1045  retry-time: 60  retries: 86400

xxx

Error "Could not find first log file name in binary log index file" occurs on mysql.log

vagrant-db1# less /var/log/mysqld.log
150805 21:42:42 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
150805 21:42:42 [ERROR] Got fatal error 1236: 'Could not find first log file name in binary log index file' from master when reading data from binary log

xxx

Error "mysql:1_monitor_20000 (node=vagrant-db2 ...): master" occurs on crm_mon

vagrant-db2# crm_mon
mysql:1_monitor_20000 (node=vagrant-db2, call=9, rc=8, status=complete): master

xxx

master-q · 2015-08-18T08:08:11Z

2015-08-18 16:50:38 7081 [ERROR] Slave I/O: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it). Error_code: 1593

master-q · 2015-08-18T08:46:33Z

CHANGE MASTER TO MASTER_HOST="centillion.db02", MASTER_USER="repl", MASTER_PASSWORD="slavepass", MASTER_LOG_FILE="mysql-bin.004876", MASTER_LOG_POS=120;

master-q · 2015-08-20T06:19:44Z

$ sudo yum install OpenIPMI-libs PyXML libaio libtool-ltdl libxslt lm_sensors net-snmp-libs openhpi-libs perl-TimeDate
$ sudo rpm -i cluster-glue-1.0.11-1.1.el5.x86_64.rpm cluster-glue-libs-1.0.11-1.1.el5.x86_64.rpm corosync-1.4.6-1.el5.x86_64.rpm corosynclib-1.4.6-1.el5.x86_64.rpm heartbeat-3.0.5-1.1.el5.x86_64.rpm heartbeat-libs-3.0.5-1.1.el5.x86_64.rpm libesmtp-1.0.4-5.el5.x86_64.rpm pacemaker-1.0.13-2.el5.x86_64.rpm pacemaker-libs-1.0.13-2.el5.x86_64.rpm resource-agents-3.9.5-1.310.0d443.el5.x86_64.rpm

master-q · 2015-08-20T06:50:00Z

http://linux-ha.osdn.jp/wp/dl/pminstall_cent5

master-q · 2015-08-20T06:57:24Z

yum install -y http://iij.dl.sourceforge.jp/linux-ha/62369/pacemaker-repo-1.1.12-1.1.el6.x86_64.rpm

master-q · 2015-08-20T06:59:30Z

http://linux-ha.osdn.jp/wp/archives/4051
http://qiita.com/takehironet/items/ee6a50f7f9b349abd085

master-q · 2015-08-20T15:23:48Z

$ /etc/corosync/corosync.conf
totem {
        version: 2
        crypto_cipher: none
        crypto_hash: none
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.10.0
                mcastport: 5405
                ttl: 1
        }
        transport: udpu
}

logging {
        fileline: off
        to_logfile: yes
        to_syslog: no
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

nodelist {
        node {
                ring0_addr: 192.168.10.136
                nodeid: 1
        }

        node {
                ring0_addr: 192.168.10.125
                nodeid: 2
        }
}

quorum {
        provider: corosync_votequorum
        expected_votes: 2
}

master-q · 2015-08-20T16:03:07Z

$ ocf-tester -n mysql_repl -o binary=/usr/local/mysql/bin/mysqld_safe -o datadir=/data/mysql -o pid=/data/mysql/mysql.pid -o socket=/tmp/mysql.sock -o log=/data/mysql/centillion.db.err -o replication_user=repl -o replication_passwd=slavepass /usr/lib/ocf/resource.d/heartbeat/mysql

master-q · 2015-08-20T22:01:37Z

$ ocf-tester -n mysql_repl -o binary=/usr/bin/mysqld_safe -o pid=/var/run/mysqld/mysqld.pid -o replication_user=repl -o replication_passwd=slavepass /usr/lib/ocf/resource.d/heartbeat/mysql

master-q · 2015-08-21T08:38:08Z

Failed actions:
    mysql_monitor_20000 on centillion.db01 'master' (8): call=28, status=complet
e, last-rc-change='Fri Aug 21 17:22:37 2015', queued=53ms, exec=122ms

master-q · 2015-08-21T08:59:02Z

Aug 21 17:40:42 [32318] centillion.db02    pengine:    debug: determine_op_status:      mysql_monitor_20000 on centillion.db02 returned 'master
' (8) instead of the expected value: 'ok' (0)
Aug 21 17:40:42 [32318] centillion.db02    pengine:  warning: unpack_rsc_op_failure:    Processing failed op monitor for mysql:0 on centillion.
db02: master (8)

master-q · 2015-08-24T05:57:08Z

正常なmysql_ocfの挙動

Mon, 24 Aug 2015 14:56:13 +0900: Start method=monitor
Mon, 24 Aug 2015 14:56:13 +0900: MySQL monitor succeeded (master)
Mon, 24 Aug 2015 14:56:15 +0900: Start method=notify
Mon, 24 Aug 2015 14:56:16 +0900: Start method=demote
Mon, 24 Aug 2015 14:56:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:56:20 +0900: Start method=monitor
Mon, 24 Aug 2015 14:56:20 +0900: MySQL monitor succeeded
Mon, 24 Aug 2015 14:56:20 +0900: Start method=monitor
Mon, 24 Aug 2015 14:56:20 +0900: MySQL monitor succeeded
Mon, 24 Aug 2015 14:56:24 +0900: Start method=notify
Mon, 24 Aug 2015 14:56:26 +0900: Start method=stop

以上なmysql_ocfの挙動

Mon, 24 Aug 2015 14:49:13 +0900: Start method=monitor
Mon, 24 Aug 2015 14:49:13 +0900: myql_common_status info => 7
Mon, 24 Aug 2015 14:49:13 +0900: Start method=start
Mon, 24 Aug 2015 14:49:17 +0900: myql_common_status err => 0
Mon, 24 Aug 2015 14:49:17 +0900: MySQL monitor succeeded (master)
Mon, 24 Aug 2015 14:49:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:18 +0900: Start method=monitor
Mon, 24 Aug 2015 14:49:18 +0900: myql_common_status err => 0
Mon, 24 Aug 2015 14:49:18 +0900: MySQL monitor succeeded (master)
Mon, 24 Aug 2015 14:49:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:19 +0900: Start method=demote
Mon, 24 Aug 2015 14:49:19 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:19 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:19 +0900: Start method=stop

master-q · 2015-08-25T11:33:00Z

$ cat /etc/resolv.conf
search SSG5-ISDN
nameserver 192.168.10.139
nameserver 61.122.116.132
nameserver 61.122.116.165
$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
#HWADDR="00:0C:29:DE:3B:44"
NM_CONTROLLED="yes"
ONBOOT="yes"
BOOTPROTO=static
IPADDR=192.168.10.136
NETMASK=255.255.255.0
GATEWAY=192.168.10.1

master-q · 2015-08-27T00:50:18Z

http://www.quora.com/What-does-error-1064-mean-when-restoring-a-MySQL-dump

master-q · 2015-08-27T01:40:51Z

centillion.db01# crm node standby centillion.db02
centillion.db01# crm configure
crm(live)configure# primitive vip_192.168.10.200 ocf:heartbeat:IPaddr2 params ip="192.168.10.200" cidr_netmask="24" nic="eth0"
crm(live)configure# property no-quorum-policy="ignore" stonith-enabled="false"
crm(live)configure# node centillion.db01
crm(live)configure# node centillion.db02
crm(live)configure# commit
crm(live)configure# quit
centillion.db01# crm
crm(live)# cib new mysql_repl
crm(mysql_repl)# configure primitive mysql ocf:heartbeat:mysql params binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid replication_user=repl replication_passwd=slavepass op start interval=0 timeout=120s op stop interval=0 timeout=120s op monitor interval=20s timeout=30s op monitor interval=10s role=Master timeout=30s op monitor interval=30s role=Slave timeout=30s op promote interval=0 timeout=120s op demote interval=0 timeout=120s op notify interval=0 timeout=90s
crm(mysql_repl)# cib commit mysql_repl
crm(mysql_repl)# quit
centillion.db01# crm configure ms mysql-clone mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
centillion.db01# crm configure colocation vip_on_mysql inf: vip_192.168.10.200 mysql-clone:Master
centillion.db01# crm configure order vip_after_mysql inf: mysql-clone:promote vip_192.168.10.200:start
centillion.db01# crm node online centillion.db02

master-q · 2015-08-27T08:00:33Z

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| gaitame_chart      |
| gaitame_chart2     |
| mysql              |
| push_chart         |
| stb                |
| test               |
+--------------------+
7 rows in set (0.00 sec)

master-q · 2015-08-27T08:06:15Z

mysql -h 192.168.10.200 -u root -p

Enter password: root

	2015-08-12 19:32:12 12461 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
	2015-08-12 19:32:12 12461 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file', Error_code: 1236

master-q/mysql.log

How to shape failover over MySQL

Environment for testing

Server vagrant-db1

Server vagrant-db2

Server vagrant-web1

VIP

Specification

Normal operation

When failure occurs

Case A: DB master has failure

Case B: DB slave has failure

Setup

Install software into vagrant-web1

Install software into vagrant-db1

Setup MySQL on vagrant-db1

Setup Heartbeat on vagrant-db1

Install software into vagrant-db2

Setup MySQL on vagrant-db2

Setup Heartbeat on vagrant-db2

Check status of Pacemaker

Setup Pacemaker on vagrant-db1

Testing replication

Create a table on vagrant-db1

Check the table and create another table on vagrant-db2

Check the table on vagrant-db1

Testing on test environment

Test A: Vagrant-db1 halts

Test B: Vagrant-db1 is temporarily disconnected

Test C: MySQL server on vagrant-db1 is killed

Test D: Vagrant-db2 halts

Test E: Vagrant-db2 is temporarily disconnected

Test F: MySQL server on vagrant-db2 is killed

Stress testing on test environment

How to run mysql-bench

How to check SLAVE STATUS

Test A': Vagrant-db1 halts, with stress

Test B': Vagrant-db1 is temporarily disconnected, with stress

Test C': MySQL server on vagrant-db1 is killed, with stress

Testing on production environment

Test A'': Vagrant-db1 halts while using product

Test B'': Vagrant-db1 is temporarily disconnected while using product

Test C'': MySQL server on vagrant-db1 is killed while using product

Test G'': Web servers are temporarily disconnected while using product

Trouble shooting

Error "mysql:0_start_0 (node=vagrant-db1 ...): unknown error" occurs on crm_mon

Error "Slave I/O thread: error connecting to master" occurs on mysql.log

Error "Could not find first log file name in binary log index file" occurs on mysql.log

Error "mysql:1_monitor_20000 (node=vagrant-db2 ...): master" occurs on crm_mon

master-q commented Aug 18, 2015

master-q commented Aug 18, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 21, 2015

master-q commented Aug 21, 2015

master-q commented Aug 24, 2015

master-q commented Aug 25, 2015

master-q commented Aug 27, 2015

master-q commented Aug 27, 2015

master-q commented Aug 27, 2015

master-q commented Aug 27, 2015

mysql -h 192.168.10.200 -u root -p