Skip to content

Instantly share code, notes, and snippets.

@master-q
Last active August 29, 2015 14:27
Show Gist options
  • Save master-q/451ca0a25d0ec4a78dc5 to your computer and use it in GitHub Desktop.
Save master-q/451ca0a25d0ec4a78dc5 to your computer and use it in GitHub Desktop.
MySQL error
2015-08-12 19:32:12 12461 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
2015-08-12 19:32:12 12461 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file', Error_code: 1236

How to shape failover over MySQL

Environment for testing

Server vagrant-db1

  • Hostname: vagrant-db1
  • OS: CentOS 5.7
  • eth0: 10.0.2.15 (for internet service)
  • eth1: 192.168.179.6 (for interconnect)

Server vagrant-db2

  • Hostname: vagrant-db2
  • OS: CentOS 5.7
  • eth0: 10.0.2.16 (for internet service)
  • eth1: 192.168.179.7 (for interconnect)

Server vagrant-web1

  • Hostname: vagrant-web1
  • OS: CentOS 5.7
  • eth0: 10.0.2.16 (for internet service)
  • eth1: 192.168.179.8 (for interconnect)

VIP

For connection by MySQL.

  • 192.168.179.100

Specification

Normal operation

  • The vagrant-db1 is DB master, the vagrant-db2 is DB slave, and the VIP points vagrant-db1.
  • Or the vagrant-db2 is DB master, the vagrant-db1 is DB slave, and the VIP points vagrant-db2.
  • MySQL client use the VIP to access MySQL server.

When failure occurs

Case A: DB master has failure

DB slave will become DB master, and the VIP points the DB master.

To fix it, you should setup a CentOS instance from scratch, and let it become DB slave.

Case B: DB slave has failure

DB master knows that DB slave shutdown.

To fix it, you should setup a CentOS instance from scratch, and let it become DB slave.

Setup

Install software into vagrant-web1

Install following software for stress testing.

vagrant-web1$ sudo yum install mysql-bench perl-DBD-MySQL

Install software into vagrant-db1

Install following software for failover.

  • heartbeat: 3.0.5
  • pacemaker: 1.0.13
  • mysql-server: 5.0.95
vagrant-db1$ sudo yum install mysql-server which
vagrant-db1$ cd /tmp
vagrant-db1$ wget 'http://osdn.jp/frs/redir.php?m=iij&f=%2Flinux-ha%2F61792%2Fpacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz'
vagrant-db1$ tar xfz pacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz
vagrant-db1$ cd pacemaker-1.0.13-2.1.el5.x86_64.repo
vagrant-db1$ sudo yum -c pacemaker.repo install heartbeat.x86_64 pacemaker.x86_64

Setup MySQL on vagrant-db1

Setup MySQL.

vagrant-db1$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=1
...
vagrant-db1$ sudo /sbin/service mysqld start
vagrant-db1$ mysql -u root -p
mysql> DELETE FROM mysql.user WHERE User = '';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'slavepass';
mysql> GRANT SUPER,REPLICATION SLAVE,REPLICATION CLIENT,PROCESS ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'slavepass';
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db1$ sudo /sbin/chkconfig mysqld on

Setup Heartbeat on vagrant-db1

Setup Heartbeat.

vagrant-db1$ sudo vi /etc/ha.d/ha.cf
pacemaker on
logfacility local1

debug 0
udpport 694

keepalive 2
warntime 20
deadtime 24
initdead 48

bcast eth1

node vagrant-db1
node vagrant-db2
vagrant-db1$ sudo vi /etc/ha.d/authkeys
auth 1
1 sha1 centabcdefg
vagrant-db1$ sudo chown root:root /etc/ha.d/authkeys
vagrant-db1$ sudo chmod 600 /etc/ha.d/authkeys
vagrant-db1$ sudo vi /etc/syslog.conf
...
*.info;mail.none;authpriv.none;cron.none;local1.none    /var/log/messages
...
# Save pacemaker log
local1.*                                                /var/log/ha-log

Boot Pacemaker.

vagrant-db1$ sudo /sbin/service heartbeat start

Install software into vagrant-db2

Install following software for failover.

vagrant-db2$ sudo yum install mysql-server which
vagrant-db2$ cd /tmp
vagrant-db2$ wget 'http://osdn.jp/frs/redir.php?m=iij&f=%2Flinux-ha%2F61792%2Fpacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz'
vagrant-db2$ tar xfz pacemaker-1.0.13-2.1.el5.x86_64.repo.tar.gz
vagrant-db2$ cd pacemaker-1.0.13-2.1.el5.x86_64.repo
vagrant-db2$ sudo yum -c pacemaker.repo install heartbeat.x86_64 pacemaker.x86_64

Setup MySQL on vagrant-db2

Setup MySQL.

vagrant-db2$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=2
...
vagrant-db2$ sudo /sbin/service mysqld start
vagrant-db2$ mysql -u root -p
mysql> DELETE FROM mysql.user WHERE User = '';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'slavepass';
mysql> GRANT SUPER,REPLICATION SLAVE,REPLICATION CLIENT,PROCESS ON *.* TO 'repl'@'localhost' IDENTIFIED BY 'slavepass';
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db2$ sudo /sbin/chkconfig mysqld on

Setup Heartbeat on vagrant-db2

Setup Heartbeat.

vagrant-db2$ sudo vi /etc/ha.d/ha.cf
pacemaker on
logfacility local1

debug 0
udpport 694

keepalive 2
warntime 20
deadtime 24
initdead 48

bcast eth1

node vagrant-db1
node vagrant-db2
vagrant-db2$ sudo vi /etc/ha.d/authkeys
auth 1
1 sha1 centabcdefg
vagrant-db2$ sudo chown root:root /etc/ha.d/authkeys
vagrant-db2$ sudo chmod 600 /etc/ha.d/authkeys
vagrant-db2$ sudo vi /etc/syslog.conf
...
*.info;mail.none;authpriv.none;cron.none;local1.none    /var/log/messages
...
# Save pacemaker log
local1.*                                                /var/log/ha-log

Boot Pacemaker.

vagrant-db2$ sudo /sbin/service heartbeat start

Check status of Pacemaker

Please check Pacemaker status using crm_mon command. It needs about 1 minute by everything online.

vagrant-db2$ sudo /usr/sbin/crm_mon
============
Last updated: Fri Jul 10 18:40:44 2015
Stack: Heartbeat
Current DC: vagrant-db2 (ca03e33c-82bb-4da9-bf64-bba48df33141) - partition with quorum
Version: 1.0.13-a83fae5
2 Nodes configured, unknown expected votes
0 Resources configured.
============

Online: [ vagrant-db1 vagrant-db2 ]

Setup Pacemaker on vagrant-db1

Setup Pacemaker using crm command.

vagrant-db1$ sudo bash
vagrant-db1# export PATH=$PATH:/usr/sbin
vagrant-db1# crm node standby vagrant-db2
vagrant-db1# crm configure
crm(live)configure# primitive vip_192.168.179.100 ocf:heartbeat:IPaddr2 params ip="192.168.179.100" cidr_netmask="24" nic="eth1"
crm(live)configure# property no-quorum-policy="ignore" stonith-enabled="false"
crm(live)configure# node vagrant-db1
crm(live)configure# node vagrant-db2
crm(live)configure# commit
crm(live)configure# quit
vagrant-db1# crm
crm(live)# cib new mysql_repl
crm(mysql_repl)# configure primitive mysql ocf:heartbeat:mysql params binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid replication_user=repl replication_passwd=slavepass op start interval=0 timeout=120s op stop interval=0 timeout=120s op monitor interval=20s timeout=30s op monitor interval=10s role=Master timeout=30s op monitor interval=30s role=Slave timeout=30s op promote interval=0 timeout=120s op demote interval=0 timeout=120s op notify interval=0 timeout=90s
crm(mysql_repl)# cib commit mysql_repl
crm(mysql_repl)# quit
vagrant-db1# crm configure ms mysql-clone mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
vagrant-db1# crm configure colocation vip_on_mysql inf: vip_192.168.179.100 mysql-clone:Master
vagrant-db1# crm configure order vip_after_mysql inf: mysql-clone:promote vip_192.168.179.100:start
vagrant-db1# crm node online vagrant-db2

Here, we are ready for replication of MySQL.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Testing replication

Create a table on vagrant-db1

First, try to create table at vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> create database example;
mysql> create table example.dummy (`id` varchar(10));
mysql> show tables in example;
mysql> quit;

Check the table and create another table on vagrant-db2

Please check the table at vagrant-db2.

vagrant-db2# mysql -u root -p
mysql> show tables in example;
mysql> quit;

Switch database master into vagrant-db2. And try to create table at vagrant-db2.

vagrant-db2# crm node standby vagrant-db1
vagrant-db2# sleep 10
vagrant-db2# crm node online vagrant-db1
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]
vagrant-db2# mysql -u root -p
mysql> create database example2;
mysql> create table example2.dummy (`id` varchar(100));
mysql> show tables in example2;
mysql> quit;

Check the table on vagrant-db1

Please check the table at vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> show tables in example2;
mysql> quit;

Testing on test environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Test A: Vagrant-db1 halts

To emulate the test case, simply shutdown vagrant-db1.

vagrant-db1# /sbin/shutdown -h now

After the testing, vagrant-db2 should be master.

vagrant-db2# crm_mon
Online: [ vagrant-db2 ]
OFFLINE: [ vagrant-db1 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Stopped: [ mysql:0 ]

To fix it, you should setup a CentOS instance from scratch, and do following.

  • Install software into vagrant-db1
  • Setup Heartbeat on vagrant-db1

Get MySQL database dump on vagrant-db2.

vagrant-db2# mysql -u root -p
mysql> FLUSH TABLES WITH READ LOCK;
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000012 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db2# mysqldump -u root -p -x --all-databases --lock-all-tables > /vagrant_data/db.dump
vagrant-db2# mysqldump -u root -p -x --allow-keywords --lock-all-tables mysql > /vagrant_data/dbuser.dump
vagrant-db2# mysql -u root -p
mysql> UNLOCK TABLES;

And repair MySQL database on vagrant-db1.

vagrant-db1$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=1
...
vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/chkconfig mysqld on
vagrant-db1# mysql -u root -p < /vagrant_data/db.dump
vagrant-db1# mysql -u root -p mysql < /vagrant_data/dbuser.dump
vagrant-db1# mysql -u root -p
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db1# /sbin/service mysqld stop

Restart Pacemaker on vagrant-db1.

vagrant-db1# /sbin/service heartbeat restart
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]
OFFLINE: [ vagrant-db1 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

vagrant-db1# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Set master status into vagrant-db2.

vagrant-db1# mysql -u root -p
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000004 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db1# crm node standby vagrant-db2
vagrant-db1# sleep 10
vagrant-db1# crm node online vagrant-db2
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

vagrant-db2# mysql -u root -p
mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000004', MASTER_LOG_POS = 98;
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test B: Vagrant-db1 is temporarily disconnected

To emulate the test case, simply reboot vagrant-db1.

vagrant-db1# reboot

After the testing, vagrant-db2 should be master and vagrant-db1 should be slave.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

Pacemaker automatically fix it.

Test C: MySQL server on vagrant-db1 is killed

To emulate the test case, simply stop MySQL server.

vagrant-db1# /sbin/service mysqld stop

After the testing, vagrant-db2 should be slave with some errors.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Stopped: [ mysql:0 ]

Failed actions:
    mysql:0_monitor_10000 (node=vagrant-db1, call=21, rc=7, status=complete): not running
    mysql:0_demote_0 (node=vagrant-db1, call=64, rc=7, status=complete): not running
    mysql:0_start_0 (node=vagrant-db1, call=68, rc=1, status=complete): unknown error

To fix it, you should run MySQL and restart Pacemaker at vagrant-db1.

vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/service heartbeat restart
vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

Test D: Vagrant-db2 halts

To emulate the test case, simply shutdown vagrant-db2.

vagrant-db2# /sbin/shutdown -h now

After the testing, vagrant-db1 should be master.

vagrant-db1# crm_mon
Online: [ vagrant-db1 ]
OFFLINE: [ vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Stopped: [ mysql:1 ]

To fix it, you should setup a CentOS instance from scratch, and do following.

  • Install software into vagrant-db2
  • Setup Heartbeat on vagrant-db2

Get MySQL database dump on vagrant-db1.

vagrant-db1# mysql -u root -p
mysql> FLUSH TABLES WITH READ LOCK;
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000011 |      270 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db1# mysqldump -u root -p -x --all-databases --lock-all-tables > /vagrant_data/db.dump
vagrant-db1# mysqldump -u root -p -x --allow-keywords --lock-all-tables mysql > /vagrant_data/dbuser.dump
vagrant-db1# mysql -u root -p
mysql> UNLOCK TABLES;

And repair MySQL database on vagrant-db2.

vagrant-db2$ sudo vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=2
...
vagrant-db2# /sbin/service mysqld start
vagrant-db2# /sbin/chkconfig mysqld on
vagrant-db2# mysql -u root -p < /vagrant_data/db.dump
vagrant-db2# mysql -u root -p mysql < /vagrant_data/dbuser.dump
vagrant-db2# mysql -u root -p
mysql> FLUSH PRIVILEGES;
mysql> QUIT;
vagrant-db2# /sbin/service mysqld stop

Restart Pacemaker on vagrant-db2.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]
OFFLINE: [ vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

vagrant-db2# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Set master status into vagrant-db1.

vagrant-db2# mysql -u root -p
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000004 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> QUIT;
vagrant-db2# crm node standby vagrant-db1
vagrant-db2# sleep 10
vagrant-db2# crm node online vagrant-db1
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db2
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]

vagrant-db1# mysql -u root -p
mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000004', MASTER_LOG_POS = 98;
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test E: Vagrant-db2 is temporarily disconnected

To emulate the test case, simply reboot vagrant-db2.

vagrant-db2# reboot

After the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Pacemaker automatically fix it.

Test F: MySQL server on vagrant-db2 is killed

To emulate the test case, simply stop MySQL server.

vagrant-db2# /sbin/service mysqld stop

After the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]

Pacemaker automatically fix it.

Stress testing on test environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave.

How to run mysql-bench

Add "sysbench" user and "test" database on MySQL.

vagrant-db1# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]
vagrant-db1# mysql -u root -p
mysql> GRANT ALL PRIVILEGES ON *.* TO 'sysbench'@"%" identified by 'sysbenchpass';
mysql> FLUSH PRIVILEGES;
mysql> CREATE DATABASE test;

Run mysql-bench at vagrant-web1.

vagrant-web1# cd /usr/share/sql-bench
vagrant-web1# while true; do sudo ./test-insert --server=mysql --host=192.168.179.100 --user=sysbench --password=sysbenchpass; done

How to check SLAVE STATUS

Switch database master into vagrant-db2. And check SLAVE STATUS as Yes at vagrant-db1.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db2 ]
     Slaves: [ vagrant-db1 ]
vagrant-db1# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Switch database master into vagrant-db1. And check SLAVE STATUS as Yes at vagrant-db2.

vagrant-db2# /sbin/service heartbeat restart
vagrant-db2# crm_mon
Online: [ vagrant-db1 vagrant-db2 ]

vip_192.168.179.100     (ocf::heartbeat:IPaddr2):       Started vagrant-db1
 Master/Slave Set: mysql-clone
     Masters: [ vagrant-db1 ]
     Slaves: [ vagrant-db2 ]
vagrant-db2# mysql -u root -p
mysql> SHOW SLAVE STATUS\G
...
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
...

Test A': Vagrant-db1 halts, with stress

Do "Test A: Vagrant-db1 halts" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Test B': Vagrant-db1 is temporarily disconnected, with stress

Do "Test B: Vagrant-db1 is temporarily disconnected" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Test C': MySQL server on vagrant-db1 is killed, with stress

Do "Test C: MySQL server on vagrant-db1 is killed" while vagrant-web runs mysql-bench. And stop mysql-bench. Finally, check SLAVE STATUS.

Testing on production environment

Before the testing, vagrant-db1 should be master and vagrant-db2 should be slave. Following testing needs a person who manually updates user-setting on web browser.

Test A'': Vagrant-db1 halts while using product

Do "Test A: Vagrant-db1 halts" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test B'': Vagrant-db1 is temporarily disconnected while using product

Do "Test B: Vagrant-db1 is temporarily disconnected" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test C'': MySQL server on vagrant-db1 is killed while using product

Do "Test C: MySQL server on vagrant-db1 is killed" while user-setting is updated continuously. Finally, check SLAVE STATUS.

Test G'': Web servers are temporarily disconnected while using product

Reboot 1st web server and 2nd web server while user-setting is updated continuously. Finally, check SLAVE STATUS.

Trouble shooting

Error "mysql:0_start_0 (node=vagrant-db1 ...): unknown error" occurs on crm_mon

To fix it, do following on the server.

vagrant-db1# /sbin/service heartbeat stop
vagrant-db1# /sbin/chkconfig mysqld on
vagrant-db1# /sbin/service mysqld start
vagrant-db1# /sbin/service heartbeat start

Error "Slave I/O thread: error connecting to master" occurs on mysql.log

vagrant-db2# less /var/log/mysqld.log
150804  8:55:17 [ERROR] Slave I/O thread: error connecting to master 'repl@vagrant-db1:3306': Error: 'Access denied for user 'repl'@'vagrant-db2' (using password: YES)'  errno: 1045  retry-time: 60  retries: 86400

xxx

Error "Could not find first log file name in binary log index file" occurs on mysql.log

vagrant-db1# less /var/log/mysqld.log
150805 21:42:42 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
150805 21:42:42 [ERROR] Got fatal error 1236: 'Could not find first log file name in binary log index file' from master when reading data from binary log

xxx

Error "mysql:1_monitor_20000 (node=vagrant-db2 ...): master" occurs on crm_mon

vagrant-db2# crm_mon
mysql:1_monitor_20000 (node=vagrant-db2, call=9, rc=8, status=complete): master

xxx

@master-q
Copy link
Author

============
Last updated: Fri Aug 14 17:28:58 2015
Stack: Heartbeat
Current DC: centillion.db02 (0302e3d0-df06-4847-b0f9-9ebddfb6aec7) - partition w
ith quorum
Version: 1.0.13-a83fae5
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ centillion.db01 centillion.db02 ]

vip_192.168.10.200      (ocf::heartbeat:IPaddr2):       Started centillion.db02
 Master/Slave Set: mysql-clone
     mysql:0    (ocf::heartbeat:mysql): Master centillion.db01 FAILED
     Masters: [ centillion.db02 ]

Failed actions:
    mysql:0_monitor_20000 (node=centillion.db01, call=166, rc=8, status=complete): master
    mysql:0_monitor_30000 (node=centillion.db01, call=167, rc=8, status=complete): master

@master-q
Copy link
Author

2015-08-14 17:38:54 17946 [ERROR] Slave SQL: Error 'Can't drop database 'test'; database doesn't exist' on query. Default database: 'test'. Query: 'drop database test', Error_code: 1008
2015-08-14 17:38:54 17946 [Warning] Slave: Can't drop database 'test'; database doesn't exist Error_code: 1008
2015-08-14 17:38:54 17946 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000003' position 120

@master-q
Copy link
Author

2015-08-14 17:48:44 27811 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( server_errno=1236)
2015-08-14 17:48:44 27811 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file', Error_code: 1236

@master-q
Copy link
Author

@master-q
Copy link
Author

2015-08-17 18:54:25 31647 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.6.16-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
2015-08-17 18:54:28 31647 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2015-08-17 18:54:28 31647 [Note] Slave I/O thread: connected to master '[email protected]:3306',replication started in log 'mysql-bin.004861' at position 120
2015-08-17 18:54:28 31647 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
2015-08-17 18:54:28 31647 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.004861' at position 120, relay log './mysql-relay-bin.000016' position: 283
2015-08-17 18:54:40 31647 [Note] /usr/local/mysql/bin/mysqld: Normal shutdown
2015-08-17 18:54:40 31647 [Note] Giving 2 client threads a chance to die gracefully
2015-08-17 18:54:40 31647 [Note] Event Scheduler: Purging the queue. 0 events
2015-08-17 18:54:40 31647 [Note] Shutting down slave threads
2015-08-17 18:54:40 31647 [Note] Error reading relay log event: slave SQL thread was killed
2015-08-17 18:54:40 31647 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2015-08-17 18:54:40 31647 [Note] Slave I/O thread killed while reading event
2015-08-17 18:54:40 31647 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.004861', position 120
2015-08-17 18:54:40 31647 [Note] Forcefully disconnecting 0 remaining clients
2015-08-17 18:54:40 31647 [Note] Binlog end

@master-q
Copy link
Author

/etc/init.d/mysql

[root@centillion data]# ps aux|grep mysql
root     30119  0.0  0.0 106188  1536 pts/0    S    18:51   0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysql --pid-file=/data/mysql/mysql.pid
mysql    30330  0.6 50.1 1713332 963696 pts/0  Sl   18:51   0:00 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysql/centillion.db01.err --pid-file=/data/mysql/mysql.pid

/etc/init.d/heartbeat

[root@centillion data]# ps aux|grep mysql
root     32171  0.0  0.0   9328  1460 ?        S    18:54   0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf --pid-file=/data/mysql/mysql.pid --socket=/var/lib/mysql/mysql.sock --datadir=/data/mysql --log-error=/var/log/mysqld.log --user=mysql --skip-slave-start
mysql    32430  7.4 50.0 1779132 961680 ?      Sl   18:54   0:00 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --skip-slave-start --log-error=/var/log/mysqld.log --pid-file=/data/mysql/mysql.pid --socket=/var/lib/mysql/mysql.sock
root     32554  0.0  0.0 107448   960 pts/0    S+   18:54   0:00 grep mysql
[root@centillion data]# ps aux|grep mysql
root     32171  0.0  0.0   9328  1460 ?        S    18:54   0:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf --pid-file=/data/mysql/mysql.pid --socket=/var/lib/mysql/mysql.sock --datadir=/data/mysql --log-error=/var/log/mysqld.log --user=mysql --skip-slave-start
mysql    32430  5.2 50.0 1779132 961684 ?      Sl   18:54   0:00 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --skip-slave-start --log-error=/var/log/mysqld.log --pid-file=/data/mysql/mysql.pid --socket=/var/lib/mysql/mysql.sock
root     32581  0.0  0.0 107448   960 pts/0    S+   18:54   0:00 grep mysql

@master-q
Copy link
Author

CRM settings for staging.

centillion.db01# crm node standby centillion.db02
centillion.db01# crm configure
crm(live)configure# primitive vip_192.168.10.200 ocf:heartbeat:IPaddr2 params ip="192.168.10.200" cidr_netmask="24" nic="eth0"
crm(live)configure# property no-quorum-policy="ignore" stonith-enabled="false"
crm(live)configure# node centillion.db01
crm(live)configure# node centillion.db02
crm(live)configure# commit
crm(live)configure# quit
centillion.db01# crm
crm(live)# cib new mysql_repl
crm(mysql_repl)# configure primitive mysql ocf:heartbeat:mysql params binary=/usr/local/mysql/bin/mysqld_safe datadir=/data/mysql pid=/data/mysql/mysql.pid socket=/tmp/mysql.sock log=/data/mysql/centillion.db.err replication_user=repl replication_passwd=slavepass op start interval=0 timeout=120s op stop interval=0 timeout=120s op monitor interval=20s timeout=30s op monitor interval=10s role=Master timeout=30s op monitor interval=30s role=Slave timeout=30s op promote interval=0 timeout=120s op demote interval=0 timeout=120s op notify interval=0 timeout=90s
crm(mysql_repl)# cib commit mysql_repl
crm(mysql_repl)# quit
centillion.db01# crm configure ms mysql-clone mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
centillion.db01# crm configure colocation vip_on_mysql inf: vip_192.168.10.200 mysql-clone:Master
centillion.db01# crm configure order vip_after_mysql inf: mysql-clone:promote vip_192.168.10.200:start
centillion.db01# crm node online centillion.db02

@master-q
Copy link
Author

Aug 18 11:32:19 centillion crmd: [16378]: info: process_lrm_event: LRM operation
mysql:1_monitor_20000 (call=727, rc=8, cib-update=733, confirmed=false) master

@master-q
Copy link
Author

@master-q
Copy link
Author

Result of master-max=2.

============
Last updated: Tue Aug 18 14:42:37 2015
Stack: Heartbeat
Current DC: centillion.db02 (0302e3d0-df06-4847-b0f9-9ebddfb6aec7) - partition w
ith quorum
Version: 1.0.13-a83fae5
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ centillion.db01 centillion.db02 ]

vip_192.168.10.200      (ocf::heartbeat:IPaddr2):       Started centillion.db01
 Master/Slave Set: mysql-clone
     Masters: [ centillion.db01 centillion.db02 ]

Failed actions:
    mysql:0_demote_0 (node=centillion.db01, call=11, rc=7, status=complete): not
 running

@master-q
Copy link
Author

2015-08-18 16:50:38 7081 [ERROR] Slave I/O: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it). Error_code: 1593

@master-q
Copy link
Author

CHANGE MASTER TO MASTER_HOST="centillion.db02", MASTER_USER="repl", MASTER_PASSWORD="slavepass", MASTER_LOG_FILE="mysql-bin.004876", MASTER_LOG_POS=120;

@master-q
Copy link
Author

$ sudo yum install OpenIPMI-libs PyXML libaio libtool-ltdl libxslt lm_sensors net-snmp-libs openhpi-libs perl-TimeDate
$ sudo rpm -i cluster-glue-1.0.11-1.1.el5.x86_64.rpm cluster-glue-libs-1.0.11-1.1.el5.x86_64.rpm corosync-1.4.6-1.el5.x86_64.rpm corosynclib-1.4.6-1.el5.x86_64.rpm heartbeat-3.0.5-1.1.el5.x86_64.rpm heartbeat-libs-3.0.5-1.1.el5.x86_64.rpm libesmtp-1.0.4-5.el5.x86_64.rpm pacemaker-1.0.13-2.el5.x86_64.rpm pacemaker-libs-1.0.13-2.el5.x86_64.rpm resource-agents-3.9.5-1.310.0d443.el5.x86_64.rpm

@master-q
Copy link
Author

@master-q
Copy link
Author

@master-q
Copy link
Author

$ /etc/corosync/corosync.conf
totem {
        version: 2
        crypto_cipher: none
        crypto_hash: none
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.10.0
                mcastport: 5405
                ttl: 1
        }
        transport: udpu
}

logging {
        fileline: off
        to_logfile: yes
        to_syslog: no
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

nodelist {
        node {
                ring0_addr: 192.168.10.136
                nodeid: 1
        }

        node {
                ring0_addr: 192.168.10.125
                nodeid: 2
        }
}

quorum {
        provider: corosync_votequorum
        expected_votes: 2
}

@master-q
Copy link
Author

$ ocf-tester -n mysql_repl -o binary=/usr/local/mysql/bin/mysqld_safe -o datadir=/data/mysql -o pid=/data/mysql/mysql.pid -o socket=/tmp/mysql.sock -o log=/data/mysql/centillion.db.err -o replication_user=repl -o replication_passwd=slavepass /usr/lib/ocf/resource.d/heartbeat/mysql

@master-q
Copy link
Author

$ ocf-tester -n mysql_repl -o binary=/usr/bin/mysqld_safe -o pid=/var/run/mysqld/mysqld.pid -o replication_user=repl -o replication_passwd=slavepass /usr/lib/ocf/resource.d/heartbeat/mysql

@master-q
Copy link
Author

Failed actions:
    mysql_monitor_20000 on centillion.db01 'master' (8): call=28, status=complet
e, last-rc-change='Fri Aug 21 17:22:37 2015', queued=53ms, exec=122ms

@master-q
Copy link
Author

Aug 21 17:40:42 [32318] centillion.db02    pengine:    debug: determine_op_status:      mysql_monitor_20000 on centillion.db02 returned 'master
' (8) instead of the expected value: 'ok' (0)
Aug 21 17:40:42 [32318] centillion.db02    pengine:  warning: unpack_rsc_op_failure:    Processing failed op monitor for mysql:0 on centillion.
db02: master (8)

@master-q
Copy link
Author

正常なmysql_ocfの挙動

Mon, 24 Aug 2015 14:56:13 +0900: Start method=monitor
Mon, 24 Aug 2015 14:56:13 +0900: MySQL monitor succeeded (master)
Mon, 24 Aug 2015 14:56:15 +0900: Start method=notify
Mon, 24 Aug 2015 14:56:16 +0900: Start method=demote
Mon, 24 Aug 2015 14:56:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:56:20 +0900: Start method=monitor
Mon, 24 Aug 2015 14:56:20 +0900: MySQL monitor succeeded
Mon, 24 Aug 2015 14:56:20 +0900: Start method=monitor
Mon, 24 Aug 2015 14:56:20 +0900: MySQL monitor succeeded
Mon, 24 Aug 2015 14:56:24 +0900: Start method=notify
Mon, 24 Aug 2015 14:56:26 +0900: Start method=stop

以上なmysql_ocfの挙動

Mon, 24 Aug 2015 14:49:13 +0900: Start method=monitor
Mon, 24 Aug 2015 14:49:13 +0900: myql_common_status info => 7
Mon, 24 Aug 2015 14:49:13 +0900: Start method=start
Mon, 24 Aug 2015 14:49:17 +0900: myql_common_status err => 0
Mon, 24 Aug 2015 14:49:17 +0900: MySQL monitor succeeded (master)
Mon, 24 Aug 2015 14:49:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:18 +0900: Start method=monitor
Mon, 24 Aug 2015 14:49:18 +0900: myql_common_status err => 0
Mon, 24 Aug 2015 14:49:18 +0900: MySQL monitor succeeded (master)
Mon, 24 Aug 2015 14:49:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:18 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:19 +0900: Start method=demote
Mon, 24 Aug 2015 14:49:19 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:19 +0900: Start method=notify
Mon, 24 Aug 2015 14:49:19 +0900: Start method=stop

@master-q
Copy link
Author

$ cat /etc/resolv.conf
search SSG5-ISDN
nameserver 192.168.10.139
nameserver 61.122.116.132
nameserver 61.122.116.165
$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
#HWADDR="00:0C:29:DE:3B:44"
NM_CONTROLLED="yes"
ONBOOT="yes"
BOOTPROTO=static
IPADDR=192.168.10.136
NETMASK=255.255.255.0
GATEWAY=192.168.10.1

@master-q
Copy link
Author

centillion.db01# crm node standby centillion.db02
centillion.db01# crm configure
crm(live)configure# primitive vip_192.168.10.200 ocf:heartbeat:IPaddr2 params ip="192.168.10.200" cidr_netmask="24" nic="eth0"
crm(live)configure# property no-quorum-policy="ignore" stonith-enabled="false"
crm(live)configure# node centillion.db01
crm(live)configure# node centillion.db02
crm(live)configure# commit
crm(live)configure# quit
centillion.db01# crm
crm(live)# cib new mysql_repl
crm(mysql_repl)# configure primitive mysql ocf:heartbeat:mysql params binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid replication_user=repl replication_passwd=slavepass op start interval=0 timeout=120s op stop interval=0 timeout=120s op monitor interval=20s timeout=30s op monitor interval=10s role=Master timeout=30s op monitor interval=30s role=Slave timeout=30s op promote interval=0 timeout=120s op demote interval=0 timeout=120s op notify interval=0 timeout=90s
crm(mysql_repl)# cib commit mysql_repl
crm(mysql_repl)# quit
centillion.db01# crm configure ms mysql-clone mysql meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
centillion.db01# crm configure colocation vip_on_mysql inf: vip_192.168.10.200 mysql-clone:Master
centillion.db01# crm configure order vip_after_mysql inf: mysql-clone:promote vip_192.168.10.200:start
centillion.db01# crm node online centillion.db02

@master-q
Copy link
Author

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| gaitame_chart      |
| gaitame_chart2     |
| mysql              |
| push_chart         |
| stb                |
| test               |
+--------------------+
7 rows in set (0.00 sec)

@master-q
Copy link
Author

mysql -h 192.168.10.200 -u root -p

Enter password: root

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment