Few things feel worst than rolling out a High Availability (HA) system, then regularly seeing that system collapse. For our team at Rails Machine, this failing HA system was MySQL Multi-Master Replication Manager (MMM).
We've been searching for a MMM replacement for a while, and a few months ago, we made the switch to MariaDB + Galera Cluster for High Availability MySQL. What's wrong with MySQL MMM? What's special about Galera Cluster? Read on!
How MySQL MMM Works: a server with MySQL MMM installed polls the MySQL nodes every ten seconds, checking their health. Only one of the servers receives the writer role - the rest can have reader roles. MMM maintains a virtual IP that points to the node with the writer role.
The problem is the polling: if MySQL MMM is polling every ten seconds (the default), what happens when the writer node fails between the checks? You're probably handling many transactions if you have an HA setup - you could have hundreds of thousands of failed transactions before MMM detects the writer node isn't healthy. Worse, what if there's an internal issue where replication fails first, then transactions fail, then you switch the writer role to the other node. The other node won't be consistent with the original writer node.
Decreasing the polling interval to one second won't fix it - large databases are likely doing many transactions per/second.
So, polling is the fundamental issue, but beyond fundamental issues, MySQL MMM in the wild often results in hard-to-recover problems. Here's Baron Swartz, MySQL God at Percona on MMM's flaws:
In short, MMM causes more downtime than it prevents. It’s a Low-Availability tool, not a High-Availability tool. It only takes one really good serious system-wide mess to take you down for a couple of days, working 24×7 trying to scrape your data off the walls and put it back into the server. MMM brings new meaning to the term “cluster-f__k”.
Despite MMM's flaws, it was the least broken way to do HA for MySQL for a time. But, times have changed. Even the creator of MySQL MMM says it's time for a change. Here's Alexey's comment on Baron's blog post on MMM:
I’m the original author of MMM and I completely agree with you. Every time I try to add HA to my clusters I remember MMM and want to stab myself because I simply could not trust my data to the tool and there is nothing else available on the market to do the job reliably.
While node health is still determined by polling, a failing node won't cause your database to enter a unrecoverable state. Why is a solution that still does health checks via polling better than the old way?
The answer lies in how replication works. With standard MySQL, writes to the master are recorded in a binary log. Slaves then reproduce the queries in the binary log. There is always a delay between when a query is run on the writer and when other nodes run it. This is asynchronous.
MySQL asynchronous replication has the following issues:
- It's common for a slave's dataset to lag behind that of the master.
- MySQL replication is slow - it replays transactions from the binary log.
With Galera, transactions are confirmed across all nodes before they are committed. If a transaction fails on a node, that node is immediately removed from the cluster. In other words, Galera replication is synchronous. You never lose a transaction - there is no lag (and Galera's row-based replication is around 5x faster).
MySQL MMM is an outsider - it's "dumb" as far as what's actually going on with the server. It does one kind of check and that's all it knows how to react to. The Galera Cluster stuff is an "insider" and can therefore be smarter about the internal state of each node and do the right thing without potentially manual intervention (a node being synced or unsynced, for example, becoming a donor, etc - it's all automatic).
While it's possible to write to any node in a Galera Cluster, we've chosen to minimize potential deadlocks and only write to a single node. To do this, we use HAProxy: we have a frontend for the writer node and a frontend for reads that balances queries across all of the nodes. The writer sends queries to a single node and the other nodes are backups.
If HAProxy detects that the writer node isn't healthly, it immediately promotes one of the backup nodes to the writer. Frequently, MySQL MMM would end up taking all of the nodes offline in this case - HAProxy doesn't do this. We may lose a couple of queries when the writer backend is updated, but it won't result in an inconsistent data set across servers, which is far more crippling.
We don't automatically repair failed nodes, but that's OK. My primary concern is just making sure a healthy node is taking writes, and HAProxy does that.
When a node fails with standard MySQL replication, you put an intense load on one server while getting replication set up again (that single server taking not only reads and writes but the load from innodbbackupex). With Galera, you take one of the nodes offline (so you need a minimum of three nodes). That node becomes the donor - write operations to it are blocked. The node transfers its data to the failed node (or new node) via rsync. Then, both the donor node and the failed node catchup by running queries from the slave queue. Once they are both back to a Synced
state, HAProxy will automatically mark them as Up
and add them back to the front-end.
Our reasons for switching to MariaDB are a mix of technical and political:
- Easy Migration: First, MariaDB is a Drop-in replacement for MySQL. Migrating from from MySQL 5.1 to MariaDB Galera Server 5.6 just worked.
- Performance: We had several queries covered by indexes that we had performance issues with before the migration that "fixed" themselves afterwards, which was a nice surprise. MariaDB seems to be much better at the complex queries and joins that Ruby on Rails is famous for. It does a better job on the whole of finding indexes and like I said before, a lot of the queries that plagued us are now speedy. We haven't seen much if any difference in memory usage either, which was kind of surprising. I expected there to be more usage with the overhead from Galera, but if there is, it's not noticeable.
- Community: MariaDB has a lot of momentum and is adding more features than MySQL. There are a lot of concerns about the future of MySQL with Oracle and MariaDB looks like it will be around for a long while - even Google is switching to MariaDB.
Absolutely. We don't see any reasons to not switch to both MariaDB and Galera Cluster.
The great thing about monitoring Galera is it's layered - we can easily monitor each part of the stack. We use Scout for monitoring, which means we just need to use the following plugins:
- [MariaDB Galera Cluster](MariaDB Galera Cluster Monitor) - install on each Galera Cluster node to fetch key metrics (Local State, Connected, etc).
- HAProxy - Install for each proxy (once for the writer, once for the readers).
- URL Monitoring - check the same status URL that HAProxy checks to determine node health.
Until recently, MySQL MMM was the best (but broken) way to add High Availability to MySQL. Galera Cluster finally adds true High Availability to MySQL primary thanks to synchronous replication.
Looks good! I think the monitoring stuff will be fun - and layered. You can monitor basically the same things haproxy does (the xinetd service) with Scout, and then monitor the number of Synced nodes as reflected by HAProxy, so you can quickly figure out if there's a problem with HAProxy or MariaDB and then fix the right place because you're getting feedback from both sources of potential problems.