So we tested the 5.6.7-RC. And ran into a strange problem:
Because of a test, a preexisting configuration with GTID enabled existed, and suddenly we did not have properly initialized grants in mysql.* created for a new installation. Turns out: GTID and non-transactional tables are no friends, and that is even documented
When using GTIDs, updates to tables using nontransactional storage engines such as MyISAM are not supported. This is because updates to such tables mixed with updates to tables that use a transactional storage engine such as InnoDB can result in multiple GTIDs being assigned to the same transaction.
Also, this is supposed to work with GRANT and REVOKE, but not with INSERT and DELETE. Now guess what mysql-install-db and friends are using?
server:~ # less /usr/share/mysql/mysql_system_tables_data.sql
INSERT INTO tmp_user VALUES ('localhost','root','',...);
This is a larger problem: We are supposed to use GRANT and REVOKE, but many people are using INSERT and DELETE in mysql.* all of the time, and so do many applications. And the mysql.* tables are MyISAM, and always have been (except that nowadays there is a wild mix of CSV and InnoDB tables in there as well).
MySQL cannot really ship GTID as a feature with MyISAM-tables in mysql.* and expect that to work anywhere. This is all very extremely broken and needs urgent fixing.
This is now support-case SR 3-6270525721: "MySQL 5.6.7-rc1, grants, replication and GTID cause problems" and will also soon have a bug number. And, no, fixing the mysql_system_tables_data.sql is not really removing the problem here.
My colleague at Booking.com, Dennis Kaarsemaker, just blogged an interesting article
about the Replication Load Monitor
we are using at work. This is based on work from Mark Leith
, but presents a much more detailed view.
May it be useful.
So I am getting a mail with a complaint about rising replication delays in a certain replication hierarchy.
Not good, because said hierarchy is one of the important ones. As in 'If that breaks, people are sleeping under the bridge'-important.
The theory was that the change rate in that hierarchy is too high for the single threaded nature of MySQL replication. That was supported by the observation that all affected boxes had no local datadir, but were filer clients. Filer clients as slaves are dying first because the SAN introduces communication latencies that local disks don't have, and the single threaded nature of replication is not helpful here, either. Filers are better when it comes to concurrent accesses, really.
So if that theory would hold that would really ruin my day. Make that month: Said hierarchy is just now recovering from severe refactoring surgery and should have almost no exploitable technical debt that can be leveraged for short term scaling and tuning. If that thing really accumulates delay we are in serious trouble.
Now, I am used to lies. People lie. Boxes lie. So let's fire up the Graphite
and have a look at how bad things are. I am choosing a random host from the supposedly sick hierarchy:
Continue reading "House and Heisenberg having Replication Delay"
I am looking for people who have a bunch of r/o slaves running, and who are using a load balancer to distribute queries across them.
The typical setup would be a PHP or Perl type of deployment with transient connections which end at the end of the page generation, and where a reconnect is being made at the next request serviced. The connect would go to the load balancer, which will forward it to any suitable database in the pool.
I am looking for people who are actually deploying this, and what strategies they have to cope with potential problems. I also would like to better understand what common problems are they needed to address.
Things I can imagine from the top of my head:
- Slave lag. Slave lag can happen on single boxes due to individual failures (battery on raid controller expires) or many boxes (ALTER TABLE logjams hierarchy). In the latter case boxes cannot be dropped from the load balancer lest you end up with an empty pool.
- Identifying problematic machines and isolating faults. At the moment, problematic machines sending requests are easily identified: We can SHOW PROCESSLIST, see the problem query, and the host and port it is coming from. We can find that, lsof on the offending source machine and see what the process is. With an LB inbetween we do lose this ability, unless we do fearful layer 2 magic at the LB. How do you identify sources of disruption elegantly and find them to take them out?
- What is a good pool size? We can unify any number of cells up to an entire data centers capacity from individual cells into one single supercell, but we think that this may be too big a setup. What are sizing guidelines to be used here?
What else am I missing here?
One of the major contributing factors to the success of MySQL is the ease and simplicity of its replication. Read-slaves for scaleout and backup-slaves for noninterrupting backups are the norm in any MySQL installation I have seen in the last two years.
So how does replication work? And how shall it be expanded in the future?
What is available?
The binlog written by MySQL currently logs all statements changing the tablespace. It is a serialization of all tablespace changes. The binlog position, expressed as (binlog name, offset), is a database global timestamp - a timestamp expressed in seconds.fraction does not work for any precision at all, because on a multi-core machine multiple things can happen concurrently.
If you want to make a consistent full backup of the database, the database must not change during the backup. That is, it must be possible to associate one and exactly one binlog position with the backup.
In fact, if you have such a backup - one associated with a binlog position - and you happen to have the binlogs from that time until now, it is possible to do a point-in-time (PIT) recovery. You'd recover from the full backup and you'd then replay the binlog from the backups binlog position until now. That is why it is important to store the binlog in a filesystem that fails independently from the rest of your MySQL. That's also why you must not filter the binlog that is written by MySQL using binlog-do-db and binlog-ignore-db - if you do, you'll get an incomplete binlog that will fail to be useful in a PIT recovery scenario.
A slave in MySQL is now nothing but a binlog downloader and executor: The slave must be restored from a PIT-capable full backup. It is then being told the current binlog position and where to log in to get the missing binlog. The slaves IO_THREAD will then log into the master server and download the binlog to the local disk as fast as possible, storing it as the relay log. The slaves SQL_THREAD will then start to execute the relay log as fast as possible. Replication can thus be thought of as an ongoing live recovery.
Continue reading "Replication - now and then"