Skip to content

Dude, where is my memory?

"Kris, please have a look at our database. We are running a materialized dataset processor, and on a database with about 40 GB of buffer pool and a database size of 6 GB, we are observing swapping."

Now, that is interesting. The box in question has 48 GB of memory, and indeed, hardly 6 GB of data.

mysql> select 
 -> sum(data_length+index_length)/1024/1024/1024 as gb 
 -> from tables 
 -> where table_schema not in ('information_schema', 'performance_schema', 'mysql');
| gb             |
| 5.832778930664 |
1 row in set (0.00 sec)

Yet, at the same time in "top", and growing:
 7552 mysql     15   0 55.1g  43g 6888 S  0.7 91.7 499:13.56 mysqld 

That's going to be interesting!

Continue reading "Dude, where is my memory?"

.mylogin.cnf password recovery

As Todd Farmer points out in Understanding mysql_config_editor’s security aspects, the new .mylogin.cnf file generated by mysql_config_editor does not securely store the password used to login to the database. It just obfuscates it.

The format of the file is as follows (as of MySQL 5.6.7-RC):
  • 4 Bytes Zero (Version Information)
  • 20 Bytes Key Generation Matter
  • Repeated:
    • 4 Bytes Length information
    • Length bytes crypted matter. The crypt is done using the AES ENCRYPT function, which in itself is insecure: It is an aes-128-ecb with a NULL IV.

The key used by AES 128 needs to be CHAR(16), but the function accepts any string as a key generation matter. It generates the key from the key generation matter by xor-ing the key generation matter onto itself in a 16 byte loop, starting with a buffer of NULL bytes.

In Code: Continue reading ".mylogin.cnf password recovery"


So we tested the 5.6.7-RC. And ran into a strange problem:

Because of a test, a preexisting configuration with GTID enabled existed, and suddenly we did not have properly initialized grants in mysql.* created for a new installation. Turns out: GTID and non-transactional tables are no friends, and that is even documented.

When using GTIDs, updates to tables using nontransactional storage engines such as MyISAM are not supported. This is because updates to such tables mixed with updates to tables that use a transactional storage engine such as InnoDB can result in multiple GTIDs being assigned to the same transaction.

Also, this is supposed to work with GRANT and REVOKE, but not with INSERT and DELETE. Now guess what mysql-install-db and friends are using?

server:~ # less /usr/share/mysql/mysql_system_tables_data.sql
INSERT INTO tmp_user VALUES ('localhost','root','',...);

This is a larger problem: We are supposed to use GRANT and REVOKE, but many people are using INSERT and DELETE in mysql.* all of the time, and so do many applications. And the mysql.* tables are MyISAM, and always have been (except that nowadays there is a wild mix of CSV and InnoDB tables in there as well).

MySQL cannot really ship GTID as a feature with MyISAM-tables in mysql.* and expect that to work anywhere. This is all very extremely broken and needs urgent fixing.

This is now support-case SR 3-6270525721: "MySQL 5.6.7-rc1, grants, replication and GTID cause problems" and will also soon have a bug number. And, no, fixing the mysql_system_tables_data.sql is not really removing the problem here.

House and Heisenberg having Replication Delay

So I am getting a mail with a complaint about rising replication delays in a certain replication hierarchy.

Not good, because said hierarchy is one of the important ones. As in 'If that breaks, people are sleeping under the bridge'-important.

The theory was that the change rate in that hierarchy is too high for the single threaded nature of MySQL replication. That was supported by the observation that all affected boxes had no local datadir, but were filer clients. Filer clients as slaves are dying first because the SAN introduces communication latencies that local disks don't have, and the single threaded nature of replication is not helpful here, either. Filers are better when it comes to concurrent accesses, really.

So if that theory would hold that would really ruin my day. Make that month: Said hierarchy is just now recovering from severe refactoring surgery and should have almost no exploitable technical debt that can be leveraged for short term scaling and tuning. If that thing really accumulates delay we are in serious trouble.

Now, I am used to lies. People lie. Boxes lie. So let's fire up the Graphite and have a look at how bad things are. I am choosing a random host from the supposedly sick hierarchy:

Continue reading "House and Heisenberg having Replication Delay"

MySQL 5.6.4-m7 is made of pure win

I have an internal project that is generating a materialized view of some pretty important data, using 96 application server cores against a 12 core database with 192g of memory and a buttload of SSD, good for about 250 MB/sec peak write rate in synthetic file creation.

The project was underperforming, even with MySQL 5.5-current. We never had more than 15mb/s sustained write rate, and never more than 24G of dirty pages in innodb. Looking at the appropriate (self-made) merlin graphs showed redo log contention and excessive checkpointing.

Install MySQL 5.6.4-m7, which can do large redo logs. A quick back-of-the-napkin calculation shows that around 32GB of redo log are just fine for our worklog, most likely. So I configure 2 ib_logfile of 16000M each, and we turn on the load.

The result is a write rate of 100mb/sec sustained, until the redo log reaches about 24G in size. Then queries stall, the internal stall protection kicks in and backs off for a minute or two, and the database checkpoints furiously. The load comes back on after the timeout, and the database stabilizes nicely at 24GB active redo log and with ongoing checkpointing, 150 MB/sec aggregated write rate and a load of 11 (at 12 cores, that is just dandy).

Since nothing exciting happens, I kill -9 the server, hard, with 24G active redo log. I get recovery after 37 minutes, with no data lost.

120327 18:18:19 mysqld_safe Starting mysqld_using_numactl daemon with databases from /mysql/fav/data
120327 18:18:19 [Note] Plugin 'FEDERATED' is disabled.
120327 18:18:29 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 230713314587
120327 18:18:30  InnoDB: Database was not shut down normally!
InnoDB: Doing recovery: scanned up to log sequence number 254522008304
InnoDB: 2 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 668 row operations to undo
InnoDB: Trx id counter is 7581696
120327 18:28:23  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 500411015, file name ../log/binlog.000089
120327 18:55:24InnoDB: Starting in background the rollback of uncommitted transactions
 InnoDB: 128 rollback segment(s) are active.
120327 18:55:24  InnoDB: Rolling back trx with id 7581431, 238 rows to undo
120327 18:55:25 InnoDB: Waiting for the background threads to start

InnoDB: Rolling back of trx id 7581431 completed
120327 18:55:25  InnoDB: Rolling back trx with id 7581430, 430 rows to undo

InnoDB: Rolling back of trx id 7581430 completed
120327 18:55:25  InnoDB: Rollback of non-prepared transactions completed
120327 18:55:26 InnoDB: 1.2.4 started; log sequence number 254522008304
120327 18:55:30 [Note] /usr/sbin/mysqld: ready for connections.

This is completely made out of win. I am really, really, really looking forward for the 5.6 GA release. Well done, Oracle-MySQL!

(Note: Large Redo logs are also available with Percona XtraDB and MariaDB)

Continue reading "MySQL 5.6.4-m7 is made of pure win"

How large can a MySQL database become?

In Maximum MySQL Database Size? Nick Duncan wants to find out what the maximum size of his MySQL database can possibly be. He answers that with a list of maximum file sizes per file system type. That is not a useful answer.

While every file system does have a maximum file size, this limitation is usually not relevant when it comes to MySQL maximum database size. But let's start with file systems, anyway.

First: You never want to run a database system on a FAT filesystem, ever. In FAT, a file is a linked list of blocks in the FAT. That is, certain "seek" (backwards seek operations) operations become slower the larger a file is, because the file system has to position the file pointer by traversing the linked list of blocks in the FAT. Since seek operations are basically what a large database does all day, FAT is completely useless for this. So the actual file size limit of FAT is kind of moot for the purpose of this discussion.

Second: You also never want to run a database system on a 32 bit operating system. Not only does that limit your file size in Windows and also in certain ways in Linux, it will also limit the amount of system memory you can invest into MySQL buffer caches. That's kind of useless, because it is memory that makes databases fast.

In 64 bit systems and with modern file systems (NTFS in Windows and XFS on LVM2 in Linux, on a recent kernel), the operating imposed file size limit is multiple terabytes or petabytes, even. We will soon see that the exact number is not really relevant.

Continue reading "How large can a MySQL database become?"

MySQL is destroying my Umlauts

This is the english translation of an article in my german blog. This article, like the german original, is licensed CC-BY-SA. The english translation has been kindly provided by Tobias Klausmann.

Recently, I had to explain this to several people, hence a writeup for the blog for easier reference. The question:
I have content in my database that can be sucessfully read and written by my application, but if I do a mysqldump to transfer the data to a new system, all the non-ASCII characters like Umlauts are destroyed.
This happens if you save data to a DB with the wrong text encoding label.

In MySQL, every string has a label that describes the character encoding the string was written in (and should be interpreted in). The string _latin1"Köhntopp" thus (hopefully) is the character sequence K-0xF6-hntopp and the string _utf8"Köhntopp" consequently should be K-0xC3 0xB6-hntopp. Problems arise as soon as the label (_latin1 or _utf8) does not match the encoding inside the string (0xF6 vs. 0xC3 0xB6).

This is outlined in more detail in Handling character sets, and you should have read that article before you continue.

Continue reading "MySQL is destroying my Umlauts"

pam modules for MySQL: What is wrong with these people?

Percona just released their MySQL PAM Authentication insanity, just as Oracle did before, for MySQL 5.5 and MariaDB is no better.

The Oracle module requires a module to be loaded into your client, which is done automatically if the module is present and the server supports PAM auth. The module is called ominously "mysql_clear_password" and does what it says on the tin: Your database server access password is henceforth sent from the client to the server in clear, not encrypted, hashed, salted or otherwise protected.

I suppose the Percona module does the same, although it is not being mentioned in the docs at all (or at least I have not been able to find it in there). They also openly suggest to run the database server as root, as that is the only way for an in-process PAM auth module to be able to access /etc/shadow.


Does any of you know what SASL is and why it has been invented?

I know it's a pain, but it is there for a reason. Many reasons. saslauthd for example will read your authentication secrets instead of your worker process, because you are unable to write and maintain a secure codebase the size of a database server. And by speaking SASL on the wire and then handing off an authenticated connection to your actual worker code you gain access to a number of integrated mechanisms for communicating passwords in a compatible and secure manner, none of which include clear text passwords on the wire.

Can we please bury these plugins, deeply in the Mariana trench, in a CASTOR, put a warning beacon over the site and then start over, doing it right this time?

Thanks. I knew you would see the light eventually.

Call for best practice: Talking to r/o slaves through a load-balancer

I am looking for people who have a bunch of r/o slaves running, and who are using a load balancer to distribute queries across them.

The typical setup would be a PHP or Perl type of deployment with transient connections which end at the end of the page generation, and where a reconnect is being made at the next request serviced. The connect would go to the load balancer, which will forward it to any suitable database in the pool.

I am looking for people who are actually deploying this, and what strategies they have to cope with potential problems. I also would like to better understand what common problems are they needed to address.

Things I can imagine from the top of my head:

- Slave lag. Slave lag can happen on single boxes due to individual failures (battery on raid controller expires) or many boxes (ALTER TABLE logjams hierarchy). In the latter case boxes cannot be dropped from the load balancer lest you end up with an empty pool.

- Identifying problematic machines and isolating faults. At the moment, problematic machines sending requests are easily identified: We can SHOW PROCESSLIST, see the problem query, and the host and port it is coming from. We can find that, lsof on the offending source machine and see what the process is. With an LB inbetween we do lose this ability, unless we do fearful layer 2 magic at the LB. How do you identify sources of disruption elegantly and find them to take them out?

- What is a good pool size? We can unify any number of cells up to an entire data centers capacity from individual cells into one single supercell, but we think that this may be too big a setup. What are sizing guidelines to be used here?

What else am I missing here?

LOAD DATA INFILE (and mysqldump)

A colleague of mine has been benchmarking mysqldump data load vs. various versions of LOAD DATA INFILE. He created a sample data as a text file with either 100k or 20M rows of five integers each, the first column of which is the pk.
perl -MList::Util=shuffle -e '@k=shuffle(1..20e6);
  for (@k) {
    print $_, "    ", join("    ", map int(rand(1e9)), 0..3), "\n";
}' > loadme_nonpkorder.txt

perl -e 'print ++$i, "    ", join("    ", map int(rand(1e9)), 0..3), "\n" 
  for 1..20e6' > loadme_pkorder.txt

All insertion has been done on empty and new tables. The text files we read at least once before to warm up the OS disk cache. The tables have two non-unique single-column indexes. All happens on a idle-ish DB master with some substantial memory and a NetApp hosting the datadir (via XFS and LVM).

He benchmarked four cases:
  1. Insertion in PK order.
  2. Insertion in PK order, dropping indexes before insertion and re-adding them later.
  3. Insertion in random order.
  4. Insertion in random order, dropping indexes before insertion and re-adding them later.

Summary: The result is not surprising: Both using PK order and dropping/re-adding indexes improves performance considerably. The PK order insertion becomes more and more crucial with a larger dataset (which is not at all surprising if you think about what happens when adding a record to the innodb PK tree).

Continue reading "LOAD DATA INFILE (and mysqldump)"

Fighting the mysqld init script

Today we discovered a particularly subtle way of fucking up a server restart. After a routine configuration change and RPM upgrade, a colleague tried to restart an important master. That failed. The message:
root@master ~]# /etc/init.d/mysql start
Starting MySQLCouldn't find MySQL manager (//bin/mysqlmanag[FAILED]erver (//bin/mysqld_safe)

The colleague tried multiple times, and finally resorted to manually typing a
nohup mysqld_safe ...
into a screen console, which he detached.

That took care of production for now and left us with an investigation. Why is the init script trying to start the MySQL manager?

It is not, and never tried to. What happen?

Continue reading "Fighting the mysqld init script"