master-q/mysql.log

How to shape failover over MySQL

Environment for testing

Server vagrant-db1

Hostname: vagrant-db1
OS: CentOS 5.7
eth0: 10.0.2.15 (for internet service)
eth1: 192.168.179.6 (for interconnect)

Server vagrant-db2

Hostname: vagrant-db2
OS: CentOS 5.7
eth0: 10.0.2.16 (for internet service)
eth1: 192.168.179.7 (for interconnect)

Server vagrant-web1

Hostname: vagrant-web1
OS: CentOS 5.7
eth0: 10.0.2.16 (for internet service)
eth1: 192.168.179.8 (for interconnect)

VIP

For connection by MySQL.

192.168.179.100

Specification

Normal operation

The vagrant-db1 is DB master, the vagrant-db2 is DB slave, and the VIP points vagrant-db1.
Or the vagrant-db2 is DB master, the vagrant-db1 is DB slave, and the VIP points vagrant-db2.
MySQL client use the VIP to access MySQL server.

When failure occurs

Case A: DB master has failure

DB slave will become DB master, and the VIP points the DB master.

To fix it, you should setup a CentOS instance from scratch, and let it become DB slave.

Case B: DB slave has failure

DB master knows that DB slave shutdown.

To fix it, you should setup a CentOS instance from scratch, and let it become DB slave.

Setup

Install software into vagrant-web1

Install following software for stress testing.

vagrant-web1$ sudo yum install mysql-bench perl-DBD-MySQL

Install software into vagrant-db1

Install following software for failover.

heartbeat: 3.0.5
pacemaker: 1.0.13
mysql-server: 5.0.95

vagrant-db1$ sudo yum install mysql-server which
vagrant-db1$ cd /tmp
vagrant-db1$ wget 'http://osdn.jp/frs/redir.php?m=iij&f=%2Flinux-ha%2F61792%2Fpacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz'
vagrant-db1$ tar xfz pacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz
vagrant-db1$ cd pacemaker-1.0.13-2.1.el5.x86_64.repo
vagrant-db1$ sudo yum -c pacemaker.repo install heartbeat.x86_64 pacemaker.x86_64

Setup MySQL on vagrant-db1

Setup MySQL.

vagrant-db1$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=1
...
vagrant-db1$ sudo /sbin/service mysqld start
vagrant-db1$ mysql -u root -p
mysql> DELETE FROM mysql.user WHERE User = '';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'slavepass';
mysql> GRANT SUPER,REPLICATION SLAVE,REPLICATION CLIENT,PROCESS ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'slavepass';
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db1$ sudo /sbin/chkconfig mysqld on

Setup Heartbeat on vagrant-db1

Setup Heartbeat.

vagrant-db1$ sudo vi /etc/ha.d/ha.cf
pacemaker on
logfacility local1

debug 0
udpport 694

keepalive 2
warntime 20
deadtime 24
initdead 48

bcast eth1

node vagrant-db1
node vagrant-db2
vagrant-db1$ sudo vi /etc/ha.d/authkeys
auth 1
1 sha1 centabcdefg
vagrant-db1$ sudo chown root:root /etc/ha.d/authkeys
vagrant-db1$ sudo chmod 600 /etc/ha.d/authkeys
vagrant-db1$ sudo vi /etc/syslog.conf
...
*.info;mail.none;authpriv.none;cron.none;local1.none    /var/log/messages
...
# Save pacemaker log
local1.*                                                /var/log/ha-log

Boot Pacemaker.

vagrant-db1$ sudo /sbin/service heartbeat start

Install software into vagrant-db2

Install following software for failover.

vagrant-db2$ sudo yum install mysql-server which
vagrant-db2$ cd /tmp
vagrant-db2$ wget 'http://osdn.jp/frs/redir.php?m=iij&f=%2Flinux-ha%2F61792%2Fpacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz'
vagrant-db2$ tar xfz pacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz
vagrant-db2$ cd pacemaker-1.0.13-2.1.el5.x86_64.repo
vagrant-db2$ sudo yum -c pacemaker.repo install heartbeat.x86_64 pacemaker.x86_64

Setup MySQL on vagrant-db2

Setup MySQL.

vagrant-db2$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=2
...
vagrant-db2$ sudo /sbin/service mysqld start
vagrant-db2$ mysql -u root -p
mysql> DELETE FROM mysql.user WHERE User = '';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'slavepass';
mysql> GRANT SUPER,REPLICATION SLAVE,REPLICATION CLIENT,PROCESS ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'slavepass';
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db2$ sudo /sbin/chkconfig mysqld on

Setup Heartbeat on vagrant-db2

Setup Heartbeat.

vagrant-db2$ sudo vi /etc/ha.d/ha.cf
pacemaker on
logfacility local1

debug 0
udpport 694

keepalive 2
warntime 20
deadtime 24
initdead 48

bcast eth1

node vagrant-db1
node vagrant-db2
vagrant-db2$ sudo vi /etc/ha.d/authkeys
auth 1
1 sha1 centabcdefg
vagrant-db2$ sudo chown root:root /etc/ha.d/authkeys
vagrant-db2$ sudo chmod 600 /etc/ha.d/authkeys
vagrant-db2$ sudo vi /etc/syslog.conf
...
*.info;mail.none;authpriv.none;cron.none;local1.none    /var/log/messages
...
# Save pacemaker log
local1.*                                                /var/log/ha-log

Boot Pacemaker.

vagrant-db2$ sudo /sbin/service heartbeat start

Check status of Pacemaker

Please check Pacemaker status using crm_mon command. It needs about 1 minute by everything online.

vagrant-db2$ sudo /usr/sbin/crm_mon
============
Last updated: Fri Jul 10 18:40:44 2015
Stack: Heartbeat
Current DC: vagrant-db2 (ca03e33c-82bb-4da9-bf64-bba48df33141) - partition with quorum
Version: 1.0.13-a83fae5
2 Nodes configured, unknown expected votes
0 Resources configured.
============

Online: [ vagrant-db1 vagrant-db2 ]

Setup Pacemaker on vagrant-db1

Setup Pacemaker using crm command.

vagrant-db1$ sudo bash
vagrant-db1# export PATH=$PATH:/usr/sbin
vagrant-db1# crm node standby vagrant-db2
vagrant-db1# crm configure
crm(live)configure# primitive vip_192.168.179.100 ocf:heartbeat:IPaddr2 params ip="192.168.179.100" cidr_netmask="24" nic="eth1"
crm(live)configure# property no-quorum-policy="ignore" stonith-enabled="false"
crm(live)configure# node vagrant-db1
crm(live)configure# node vagrant-db2
crm(live)configure# commit
crm(live)configure# quit
vagrant-db1# crm
crm(live)# cib new mysql_repl
crm(mysql_repl)# configure primitive mysql ocf:heartbeat:mysql params binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid replication_user=repl replication_passwd=slavepass op start interval=0 timeout=120s op stop interval=0 timeout=120s op monitor interval=20s timeout=30s op monitor interval=10s role=Master timeout=30s op monitor interval=30s role=Slave timeout=30s op promote interval=0 timeout=120s op demote interval=0 timeout=120s op notify interval=0 timeout=90s
crm(mysql_repl)# cib commit mysql_repl
crm(mysql_repl)# quit
vagrant-db1# crm configure ms mysql-clone mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
vagrant-db1# crm configure colocation vip_on_mysql inf: vip_192.168.179.100 mysql-clone:Master
vagrant-db1# crm configure order vip_after_mysql inf: mysql-clone:promote vip_192.168.179.100:start
vagrant-db1# crm node online vagrant-db2

Here, we are ready for replication of MySQL.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Testing replication

Create a table on vagrant-db1

First, try to create table at vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> create database example;
mysql> create table example.dummy (`id` varchar(10));
mysql> show tables in example;
mysql> quit;

Check the table and create another table on vagrant-db2

Please check the table at vagrant-db2.

vagrant-db2# mysql -u root -p
mysql> show tables in example;
mysql> quit;

Switch database master into vagrant-db2. And try to create table at vagrant-db2.

vagrant-db2# crm node standby vagrant-db1
vagrant-db2# sleep 10
vagrant-db2# crm node online vagrant-db1
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]
vagrant-db2# mysql -u root -p
mysql> create database example2;
mysql> create table example2.dummy (`id` varchar(100));
mysql> show tables in example2;
mysql> quit;

Check the table on vagrant-db1

Please check the table at vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> show tables in example2;
mysql> quit;

Testing on test environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Test A: Vagrant-db1 halts

To emulate the test case, simply shutdown vagrant-db1.

vagrant-db1# /sbin/shutdown -h now

After the testing, vagrant-db2 should be master.

vagrant-db2# crm_mon
Online: [ vagrant-db2 ]
OFFLINE: [ vagrant-db1 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Stopped: [ mysql:0 ]

To fix it, you should setup a CentOS instance from scratch, and do following.

Install software into vagrant-db1
Setup Heartbeat on vagrant-db1

Get MySQL database dump on vagrant-db2.

vagrant-db2# mysql -u root -p
mysql> FLUSH TABLES WITH READ LOCK;
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000012 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db2# mysqldump -u root -p -x --all-databases --lock-all-tables > /vagrant_data/db.dump
vagrant-db2# mysqldump -u root -p -x --allow-keywords --lock-all-tables mysql > /vagrant_data/dbuser.dump
vagrant-db2# mysql -u root -p
mysql> UNLOCK TABLES;

And repair MySQL database on vagrant-db1.

vagrant-db1$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=1
...
vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/chkconfig mysqld on
vagrant-db1# mysql -u root -p < /vagrant_data/db.dump
vagrant-db1# mysql -u root -p mysql < /vagrant_data/dbuser.dump
vagrant-db1# mysql -u root -p
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db1# /sbin/service mysqld stop

Restart Pacemaker on vagrant-db1.

vagrant-db1# /sbin/service heartbeat restart
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]
OFFLINE: [ vagrant-db1 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

vagrant-db1# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Set master status into vagrant-db2.

vagrant-db1# mysql -u root -p
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000004 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db1# crm node standby vagrant-db2
vagrant-db1# sleep 10
vagrant-db1# crm node online vagrant-db2
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

vagrant-db2# mysql -u root -p
mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000004', MASTER_LOG_POS = 98;
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test B: Vagrant-db1 is temporarily disconnected

To emulate the test case, simply reboot vagrant-db1.

vagrant-db1# reboot

After the testing, vagrant-db2 should be master and vagrant-db1 should be slave.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

Pacemaker automatically fix it.

Test C: MySQL server on vagrant-db1 is killed

To emulate the test case, simply stop MySQL server.

vagrant-db1# /sbin/service mysqld stop

After the testing, vagrant-db2 should be slave with some errors.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Stopped: [ mysql:0 ]

Failed actions:
    mysql:0_monitor_10000 (node=vagrant-db1, call=21, rc=7, status=complete): not running
    mysql:0_demote_0 (node=vagrant-db1, call=64, rc=7, status=complete): not running
    mysql:0_start_0 (node=vagrant-db1, call=68, rc=1, status=complete): unknown error

To fix it, you should run MySQL and restart Pacemaker at vagrant-db1.

vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/service heartbeat restart
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

Test D: Vagrant-db2 halts

To emulate the test case, simply shutdown vagrant-db2.

vagrant-db2# /sbin/shutdown -h now

After the testing, vagrant-db1 should be master.

vagrant-db1# crm_mon
Online: [ vagrant-db1 ]
OFFLINE: [ vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Stopped: [ mysql:1 ]

To fix it, you should setup a CentOS instance from scratch, and do following.

Install software into vagrant-db2
Setup Heartbeat on vagrant-db2

Get MySQL database dump on vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> FLUSH TABLES WITH READ LOCK;
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000011 |      270 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db1# mysqldump -u root -p -x --all-databases --lock-all-tables > /vagrant_data/db.dump
vagrant-db1# mysqldump -u root -p -x --allow-keywords --lock-all-tables mysql > /vagrant_data/dbuser.dump
vagrant-db1# mysql -u root -p
mysql> UNLOCK TABLES;

And repair MySQL database on vagrant-db2.

vagrant-db2$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=2
...
vagrant-db2# /sbin/service mysqld start
vagrant-db2# /sbin/chkconfig mysqld on
vagrant-db2# mysql -u root -p < /vagrant_data/db.dump
vagrant-db2# mysql -u root -p mysql < /vagrant_data/dbuser.dump
vagrant-db2# mysql -u root -p
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db2# /sbin/service mysqld stop

Restart Pacemaker on vagrant-db2.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]
OFFLINE: [ vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

vagrant-db2# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Set master status into vagrant-db1.

vagrant-db2# mysql -u root -p
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000004 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db2# crm node standby vagrant-db1
vagrant-db2# sleep 10
vagrant-db2# crm node online vagrant-db1
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

vagrant-db1# mysql -u root -p
mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000004', MASTER_LOG_POS = 98;
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test E: Vagrant-db2 is temporarily disconnected

To emulate the test case, simply reboot vagrant-db2.

vagrant-db2# reboot

After the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Pacemaker automatically fix it.

Test F: MySQL server on vagrant-db2 is killed

To emulate the test case, simply stop MySQL server.

vagrant-db2# /sbin/service mysqld stop

After the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Pacemaker automatically fix it.

Stress testing on test environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

How to run mysql-bench

Add "sysbench" user and "test" database on MySQL.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]
vagrant-db1# mysql -u root -p
mysql> GRANT ALL PRIVILEGES ON *.* TO 'sysbench'@"%" identified by 'sysbenchpass';
mysql> FLUSH PRIVILEGES;
mysql> CREATE DATABASE test;

Run mysql-bench at vagrant-web1.

vagrant-web1# cd /usr/share/sql-bench
vagrant-web1# while true; do sudo ./test-insert --server=mysql --host=192.168.179.100 --user=sysbench --password=sysbenchpass; done

How to check SLAVE STATUS

Switch database master into vagrant-db2. And check SLAVE STATUS as Yes at vagrant-db1.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]
vagrant-db1# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Switch database master into vagrant-db1. And check SLAVE STATUS as Yes at vagrant-db2.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]
vagrant-db2# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test A': Vagrant-db1 halts, with stress

Do "Test A: Vagrant-db1 halts" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Test B': Vagrant-db1 is temporarily disconnected, with stress

Do "Test B: Vagrant-db1 is temporarily disconnected" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Test C': MySQL server on vagrant-db1 is killed, with stress

Do "Test C: MySQL server on vagrant-db1 is killed" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Testing on production environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave. Following testing needs a person who manually updates user-setting on web browser.

Test A'': Vagrant-db1 halts while using product

Do "Test A: Vagrant-db1 halts" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test B'': Vagrant-db1 is temporarily disconnected while using product

Do "Test B: Vagrant-db1 is temporarily disconnected" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test C'': MySQL server on vagrant-db1 is killed while using product

Do "Test C: MySQL server on vagrant-db1 is killed" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test G'': Web servers are temporarily disconnected while using product

Reboot 1st web server and 2nd web server while user-setting is updated continuously. Finally, check SLAVE STATUS.

Trouble shooting

Error "mysql:0_start_0 (node=vagrant-db1 ...): unknown error" occurs on crm_mon

To fix it, do following on the server.

vagrant-db1# /sbin/service heartbeat stop
vagrant-db1# /sbin/chkconfig mysqld on
vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/service heartbeat start

Error "Slave I/O thread: error connecting to master" occurs on mysql.log

vagrant-db2# less /var/log/mysqld.log
150804  8:55:17 [ERROR] Slave I/O thread: error connecting to master 'repl@vagrant-db1:3306': Error: 'Access denied for user 'repl'@'vagrant-db2' (using password: YES)'  errno: 1045  retry-time: 60  retries: 86400

xxx

Error "Could not find first log file name in binary log index file" occurs on mysql.log

vagrant-db1# less /var/log/mysqld.log
150805 21:42:42 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
150805 21:42:42 [ERROR] Got fatal error 1236: 'Could not find first log file name in binary log index file' from master when reading data from binary log

xxx

Error "mysql:1_monitor_20000 (node=vagrant-db2 ...): master" occurs on crm_mon

vagrant-db2# crm_mon
mysql:1_monitor_20000 (node=vagrant-db2, call=9, rc=8, status=complete): master

xxx

master-q · 2015-08-14T08:43:07Z

============
Last updated: Fri Aug 14 17:28:58 2015
Stack: Heartbeat
Current DC: centillion.db02 (0302e3d0-df06-4847-b0f9-9ebddfb6aec7) - partition w
ith quorum
Version: 1.0.13-a83fae5
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ centillion.db01 centillion.db02 ]

vip_192.168.10.200      (ocf::heartbeat:IPaddr2):       Started centillion.db02
 Master/Slave Set: mysql-clone
     mysql:0    (ocf::heartbeat:mysql): Master centillion.db01 FAILED
     Masters: [ centillion.db02 ]

Failed actions:
    mysql:0_monitor_20000 (node=centillion.db01, call=166, rc=8, status=complete): master
    mysql:0_monitor_30000 (node=centillion.db01, call=167, rc=8, status=complete): master

master-q · 2015-08-14T09:00:22Z

2015-08-14 17:38:54 17946 [ERROR] Slave SQL: Error 'Can't drop database 'test'; database doesn't exist' on query. Default database: 'test'. Query: 'drop database test', Error_code: 1008
2015-08-14 17:38:54 17946 [Warning] Slave: Can't drop database 'test'; database doesn't exist Error_code: 1008
2015-08-14 17:38:54 17946 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000003' position 120

master-q · 2015-08-14T09:04:39Z

2015-08-14 17:48:44 27811 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
2015-08-14 17:48:44 27811 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file', Error_code: 1236

master-q · 2015-08-17T09:22:02Z

http://qiita.com/hit/items/42cfcb28aa988b5a16f7

master-q · 2015-08-17T09:46:18Z

2015-08-17 18:54:25 31647 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.6.16-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
2015-08-17 18:54:28 31647 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2015-08-17 18:54:28 31647 [Note] Slave I/O thread: connected to master '[email protected]:3306',replication started in log 'mysql-bin.004861' at position 120
2015-08-17 18:54:28 31647 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
2015-08-17 18:54:28 31647 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.004861' at position 120, relay log './mysql-relay-bin.000016' position: 283
2015-08-17 18:54:40 31647 [Note] /usr/local/mysql/bin/mysqld: Normal shutdown
2015-08-17 18:54:40 31647 [Note] Giving 2 client threads a chance to die gracefully
2015-08-17 18:54:40 31647 [Note] Event Scheduler: Purging the queue. 0 events
2015-08-17 18:54:40 31647 [Note] Shutting down slave threads
2015-08-17 18:54:40 31647 [Note] Error reading relay log event: slave SQL thread was killed
2015-08-17 18:54:40 31647 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2015-08-17 18:54:40 31647 [Note] Slave I/O thread killed while reading event
2015-08-17 18:54:40 31647 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.004861', position 120
2015-08-17 18:54:40 31647 [Note] Forcefully disconnecting 0 remaining clients
2015-08-17 18:54:40 31647 [Note] Binlog end

master-q · 2015-08-17T10:07:33Z

/etc/init.d/mysql

[root@centillion data]# ps aux|grep mysql
root     30119  0.0  0.0 106188  1536 pts/0    S    18:51   0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysql --pid-file=/data/mysql/mysql.pid
mysql    30330  0.6 50.1 1713332 963696 pts/0  Sl   18:51   0:00 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysql/centillion.db01.err --pid-file=/data/mysql/mysql.pid

/etc/init.d/heartbeat

[root@centillion data]# ps aux|grep mysql
root     32171  0.0  0.0   9328  1460 ?        S    18:54   0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf --pid-file=/data/mysql/mysql.pid --socket=/var/lib/mysql/mysql.sock --datadir=/data/mysql --log-error=/var/log/mysqld.log --user=mysql --skip-slave-start
mysql    32430  7.4 50.0 1779132 961680 ?      Sl   18:54   0:00 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --skip-slave-start --log-error=/var/log/mysqld.log --pid-file=/data/mysql/mysql.pid --socket=/var/lib/mysql/mysql.sock
root     32554  0.0  0.0 107448   960 pts/0    S+   18:54   0:00 grep mysql
[root@centillion data]# ps aux|grep mysql
root     32171  0.0  0.0   9328  1460 ?        S    18:54   0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf --pid-file=/data/mysql/mysql.pid --socket=/var/lib/mysql/mysql.sock --datadir=/data/mysql --log-error=/var/log/mysqld.log --user=mysql --skip-slave-start
mysql    32430  5.2 50.0 1779132 961684 ?      Sl   18:54   0:00 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --skip-slave-start --log-error=/var/log/mysqld.log --pid-file=/data/mysql/mysql.pid --socket=/var/lib/mysql/mysql.sock
root     32581  0.0  0.0 107448   960 pts/0    S+   18:54   0:00 grep mysql

master-q · 2015-08-18T02:08:55Z

CRM settings for staging.

centillion.db01# crm node standby centillion.db02
centillion.db01# crm configure
crm(live)configure# primitive vip_192.168.10.200 ocf:heartbeat:IPaddr2 params ip="192.168.10.200" cidr_netmask="24" nic="eth0"
crm(live)configure# property no-quorum-policy="ignore" stonith-enabled="false"
crm(live)configure# node centillion.db01
crm(live)configure# node centillion.db02
crm(live)configure# commit
crm(live)configure# quit
centillion.db01# crm
crm(live)# cib new mysql_repl
crm(mysql_repl)# configure primitive mysql ocf:heartbeat:mysql params binary=/usr/local/mysql/bin/mysqld_safe datadir=/data/mysql pid=/data/mysql/mysql.pid socket=/tmp/mysql.sock log=/data/mysql/centillion.db.err replication_user=repl replication_passwd=slavepass op start interval=0 timeout=120s op stop interval=0 timeout=120s op monitor interval=20s timeout=30s op monitor interval=10s role=Master timeout=30s op monitor interval=30s role=Slave timeout=30s op promote interval=0 timeout=120s op demote interval=0 timeout=120s op notify interval=0 timeout=90s
crm(mysql_repl)# cib commit mysql_repl
crm(mysql_repl)# quit
centillion.db01# crm configure ms mysql-clone mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
centillion.db01# crm configure colocation vip_on_mysql inf: vip_192.168.10.200 mysql-clone:Master
centillion.db01# crm configure order vip_after_mysql inf: mysql-clone:promote vip_192.168.10.200:start
centillion.db01# crm node online centillion.db02

master-q · 2015-08-18T03:11:40Z

Aug 18 11:32:19 centillion crmd: [16378]: info: process_lrm_event: LRM operation
mysql:1_monitor_20000 (call=727, rc=8, cib-update=733, confirmed=false) master

master-q · 2015-08-18T03:27:26Z

http://clusterlabs.org/wiki/Debugging_Resource_Failures

master-q · 2015-08-18T05:58:40Z

Result of master-max=2.

============
Last updated: Tue Aug 18 14:42:37 2015
Stack: Heartbeat
Current DC: centillion.db02 (0302e3d0-df06-4847-b0f9-9ebddfb6aec7) - partition w
ith quorum
Version: 1.0.13-a83fae5
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ centillion.db01 centillion.db02 ]

vip_192.168.10.200      (ocf::heartbeat:IPaddr2):       Started centillion.db01
 Master/Slave Set: mysql-clone
     Masters: [ centillion.db01 centillion.db02 ]

Failed actions:
    mysql:0_demote_0 (node=centillion.db01, call=11, rc=7, status=complete): not
 running

master-q · 2015-08-18T08:08:11Z

2015-08-18 16:50:38 7081 [ERROR] Slave I/O: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it). Error_code: 1593

master-q · 2015-08-18T08:46:33Z

CHANGE MASTER TO MASTER_HOST="centillion.db02", MASTER_USER="repl", MASTER_PASSWORD="slavepass", MASTER_LOG_FILE="mysql-bin.004876", MASTER_LOG_POS=120;

master-q · 2015-08-20T06:19:44Z

$ sudo yum install OpenIPMI-libs PyXML libaio libtool-ltdl libxslt lm_sensors net-snmp-libs openhpi-libs perl-TimeDate
$ sudo rpm -i cluster-glue-1.0.11-1.1.el5.x86_64.rpm cluster-glue-libs-1.0.11-1.1.el5.x86_64.rpm corosync-1.4.6-1.el5.x86_64.rpm corosynclib-1.4.6-1.el5.x86_64.rpm heartbeat-3.0.5-1.1.el5.x86_64.rpm heartbeat-libs-3.0.5-1.1.el5.x86_64.rpm libesmtp-1.0.4-5.el5.x86_64.rpm pacemaker-1.0.13-2.el5.x86_64.rpm pacemaker-libs-1.0.13-2.el5.x86_64.rpm resource-agents-3.9.5-1.310.0d443.el5.x86_64.rpm

master-q · 2015-08-20T06:50:00Z

http://linux-ha.osdn.jp/wp/dl/pminstall_cent5

master-q · 2015-08-20T06:57:24Z

yum install -y http://iij.dl.sourceforge.jp/linux-ha/62369/pacemaker-repo-1.1.12-1.1.el6.x86_64.rpm

master-q · 2015-08-20T06:59:30Z

http://linux-ha.osdn.jp/wp/archives/4051
http://qiita.com/takehironet/items/ee6a50f7f9b349abd085

master-q · 2015-08-20T15:23:48Z

$ /etc/corosync/corosync.conf
totem {
        version: 2
        crypto_cipher: none
        crypto_hash: none
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.10.0
                mcastport: 5405
                ttl: 1
        }
        transport: udpu
}

logging {
        fileline: off
        to_logfile: yes
        to_syslog: no
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

nodelist {
        node {
                ring0_addr: 192.168.10.136
                nodeid: 1
        }

        node {
                ring0_addr: 192.168.10.125
                nodeid: 2
        }
}

quorum {
        provider: corosync_votequorum
        expected_votes: 2
}

master-q · 2015-08-20T16:03:07Z

$ ocf-tester -n mysql_repl -o binary=/usr/local/mysql/bin/mysqld_safe -o datadir=/data/mysql -o pid=/data/mysql/mysql.pid -o socket=/tmp/mysql.sock -o log=/data/mysql/centillion.db.err -o replication_user=repl -o replication_passwd=slavepass /usr/lib/ocf/resource.d/heartbeat/mysql

master-q · 2015-08-20T22:01:37Z

$ ocf-tester -n mysql_repl -o binary=/usr/bin/mysqld_safe -o pid=/var/run/mysqld/mysqld.pid -o replication_user=repl -o replication_passwd=slavepass /usr/lib/ocf/resource.d/heartbeat/mysql

master-q · 2015-08-21T08:38:08Z

Failed actions:
    mysql_monitor_20000 on centillion.db01 'master' (8): call=28, status=complet
e, last-rc-change='Fri Aug 21 17:22:37 2015', queued=53ms, exec=122ms

master-q · 2015-08-21T08:59:02Z

Aug 21 17:40:42 [32318] centillion.db02    pengine:    debug: determine_op_status:      mysql_monitor_20000 on centillion.db02 returned 'master
' (8) instead of the expected value: 'ok' (0)
Aug 21 17:40:42 [32318] centillion.db02    pengine:  warning: unpack_rsc_op_failure:    Processing failed op monitor for mysql:0 on centillion.
db02: master (8)

master-q · 2015-08-24T05:57:08Z

正常なmysql_ocfの挙動

Mon, 24 Aug 2015 14:56:13 +0900: Start method=monitor
Mon, 24 Aug 2015 14:56:13 +0900: MySQL monitor succeeded (master)
Mon, 24 Aug 2015 14:56:15 +0900: Start method=notify
Mon, 24 Aug 2015 14:56:16 +0900: Start method=demote
Mon, 24 Aug 2015 14:56:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:56:20 +0900: Start method=monitor
Mon, 24 Aug 2015 14:56:20 +0900: MySQL monitor succeeded
Mon, 24 Aug 2015 14:56:20 +0900: Start method=monitor
Mon, 24 Aug 2015 14:56:20 +0900: MySQL monitor succeeded
Mon, 24 Aug 2015 14:56:24 +0900: Start method=notify
Mon, 24 Aug 2015 14:56:26 +0900: Start method=stop

以上なmysql_ocfの挙動

Mon, 24 Aug 2015 14:49:13 +0900: Start method=monitor
Mon, 24 Aug 2015 14:49:13 +0900: myql_common_status info => 7
Mon, 24 Aug 2015 14:49:13 +0900: Start method=start
Mon, 24 Aug 2015 14:49:17 +0900: myql_common_status err => 0
Mon, 24 Aug 2015 14:49:17 +0900: MySQL monitor succeeded (master)
Mon, 24 Aug 2015 14:49:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:18 +0900: Start method=monitor
Mon, 24 Aug 2015 14:49:18 +0900: myql_common_status err => 0
Mon, 24 Aug 2015 14:49:18 +0900: MySQL monitor succeeded (master)
Mon, 24 Aug 2015 14:49:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:19 +0900: Start method=demote
Mon, 24 Aug 2015 14:49:19 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:19 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:19 +0900: Start method=stop

master-q · 2015-08-25T11:33:00Z

$ cat /etc/resolv.conf
search SSG5-ISDN
nameserver 192.168.10.139
nameserver 61.122.116.132
nameserver 61.122.116.165
$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
#HWADDR="00:0C:29:DE:3B:44"
NM_CONTROLLED="yes"
ONBOOT="yes"
BOOTPROTO=static
IPADDR=192.168.10.136
NETMASK=255.255.255.0
GATEWAY=192.168.10.1

master-q · 2015-08-27T00:50:18Z

http://www.quora.com/What-does-error-1064-mean-when-restoring-a-MySQL-dump

master-q · 2015-08-27T01:40:51Z

centillion.db01# crm node standby centillion.db02
centillion.db01# crm configure
crm(live)configure# primitive vip_192.168.10.200 ocf:heartbeat:IPaddr2 params ip="192.168.10.200" cidr_netmask="24" nic="eth0"
crm(live)configure# property no-quorum-policy="ignore" stonith-enabled="false"
crm(live)configure# node centillion.db01
crm(live)configure# node centillion.db02
crm(live)configure# commit
crm(live)configure# quit
centillion.db01# crm
crm(live)# cib new mysql_repl
crm(mysql_repl)# configure primitive mysql ocf:heartbeat:mysql params binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid replication_user=repl replication_passwd=slavepass op start interval=0 timeout=120s op stop interval=0 timeout=120s op monitor interval=20s timeout=30s op monitor interval=10s role=Master timeout=30s op monitor interval=30s role=Slave timeout=30s op promote interval=0 timeout=120s op demote interval=0 timeout=120s op notify interval=0 timeout=90s
crm(mysql_repl)# cib commit mysql_repl
crm(mysql_repl)# quit
centillion.db01# crm configure ms mysql-clone mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
centillion.db01# crm configure colocation vip_on_mysql inf: vip_192.168.10.200 mysql-clone:Master
centillion.db01# crm configure order vip_after_mysql inf: mysql-clone:promote vip_192.168.10.200:start
centillion.db01# crm node online centillion.db02

master-q · 2015-08-27T08:00:33Z

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| gaitame_chart      |
| gaitame_chart2     |
| mysql              |
| push_chart         |
| stb                |
| test               |
+--------------------+
7 rows in set (0.00 sec)

master-q · 2015-08-27T08:06:15Z

mysql -h 192.168.10.200 -u root -p

Enter password: root

	2015-08-12 19:32:12 12461 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
	2015-08-12 19:32:12 12461 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file', Error_code: 1236

master-q/mysql.log

How to shape failover over MySQL

Environment for testing

Server vagrant-db1

Server vagrant-db2

Server vagrant-web1

VIP

Specification

Normal operation

When failure occurs

Case A: DB master has failure

Case B: DB slave has failure

Setup

Install software into vagrant-web1

Install software into vagrant-db1

Setup MySQL on vagrant-db1

Setup Heartbeat on vagrant-db1

Install software into vagrant-db2

Setup MySQL on vagrant-db2

Setup Heartbeat on vagrant-db2

Check status of Pacemaker

Setup Pacemaker on vagrant-db1

Testing replication

Create a table on vagrant-db1

Check the table and create another table on vagrant-db2

Check the table on vagrant-db1

Testing on test environment

Test A: Vagrant-db1 halts

Test B: Vagrant-db1 is temporarily disconnected

Test C: MySQL server on vagrant-db1 is killed

Test D: Vagrant-db2 halts

Test E: Vagrant-db2 is temporarily disconnected

Test F: MySQL server on vagrant-db2 is killed

Stress testing on test environment

How to run mysql-bench

How to check SLAVE STATUS

Test A': Vagrant-db1 halts, with stress

Test B': Vagrant-db1 is temporarily disconnected, with stress

Test C': MySQL server on vagrant-db1 is killed, with stress

Testing on production environment

Test A'': Vagrant-db1 halts while using product

Test B'': Vagrant-db1 is temporarily disconnected while using product

Test C'': MySQL server on vagrant-db1 is killed while using product

Test G'': Web servers are temporarily disconnected while using product

Trouble shooting

Error "mysql:0_start_0 (node=vagrant-db1 ...): unknown error" occurs on crm_mon

Error "Slave I/O thread: error connecting to master" occurs on mysql.log

Error "Could not find first log file name in binary log index file" occurs on mysql.log

Error "mysql:1_monitor_20000 (node=vagrant-db2 ...): master" occurs on crm_mon

master-q commented Aug 14, 2015

master-q commented Aug 14, 2015

master-q commented Aug 14, 2015

master-q commented Aug 17, 2015

master-q commented Aug 17, 2015

master-q commented Aug 17, 2015

master-q commented Aug 18, 2015

master-q commented Aug 18, 2015

master-q commented Aug 18, 2015

master-q commented Aug 18, 2015

master-q commented Aug 18, 2015

master-q commented Aug 18, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 20, 2015

master-q commented Aug 21, 2015

master-q commented Aug 21, 2015

master-q commented Aug 24, 2015

master-q commented Aug 25, 2015

master-q commented Aug 27, 2015

master-q commented Aug 27, 2015

master-q commented Aug 27, 2015

master-q commented Aug 27, 2015

mysql -h 192.168.10.200 -u root -p