Skip to content

Dude, where is my memory?

"Kris, please have a look at our database. We are running a materialized dataset processor, and on a database with about 40 GB of buffer pool and a database size of 6 GB, we are observing swapping."

Now, that is interesting. The box in question has 48 GB of memory, and indeed, hardly 6 GB of data.

mysql> select 
 -> sum(data_length+index_length)/1024/1024/1024 as gb 
 -> from tables 
 -> where table_schema not in ('information_schema', 'performance_schema', 'mysql');
| gb             |
| 5.832778930664 |
1 row in set (0.00 sec)

Yet, at the same time in "top", and growing:
 7552 mysql     15   0 55.1g  43g 6888 S  0.7 91.7 499:13.56 mysqld 

That's going to be interesting!

Continue reading "Dude, where is my memory?"

.mylogin.cnf password recovery

As Todd Farmer points out in Understanding mysql_config_editor’s security aspects, the new .mylogin.cnf file generated by mysql_config_editor does not securely store the password used to login to the database. It just obfuscates it.

The format of the file is as follows (as of MySQL 5.6.7-RC):
  • 4 Bytes Zero (Version Information)
  • 20 Bytes Key Generation Matter
  • Repeated:
    • 4 Bytes Length information
    • Length bytes crypted matter. The crypt is done using the AES ENCRYPT function, which in itself is insecure: It is an aes-128-ecb with a NULL IV.

The key used by AES 128 needs to be CHAR(16), but the function accepts any string as a key generation matter. It generates the key from the key generation matter by xor-ing the key generation matter onto itself in a 16 byte loop, starting with a buffer of NULL bytes.

In Code: Continue reading ".mylogin.cnf password recovery"


So we tested the 5.6.7-RC. And ran into a strange problem:

Because of a test, a preexisting configuration with GTID enabled existed, and suddenly we did not have properly initialized grants in mysql.* created for a new installation. Turns out: GTID and non-transactional tables are no friends, and that is even documented.

When using GTIDs, updates to tables using nontransactional storage engines such as MyISAM are not supported. This is because updates to such tables mixed with updates to tables that use a transactional storage engine such as InnoDB can result in multiple GTIDs being assigned to the same transaction.

Also, this is supposed to work with GRANT and REVOKE, but not with INSERT and DELETE. Now guess what mysql-install-db and friends are using?

server:~ # less /usr/share/mysql/mysql_system_tables_data.sql
INSERT INTO tmp_user VALUES ('localhost','root','',...);

This is a larger problem: We are supposed to use GRANT and REVOKE, but many people are using INSERT and DELETE in mysql.* all of the time, and so do many applications. And the mysql.* tables are MyISAM, and always have been (except that nowadays there is a wild mix of CSV and InnoDB tables in there as well).

MySQL cannot really ship GTID as a feature with MyISAM-tables in mysql.* and expect that to work anywhere. This is all very extremely broken and needs urgent fixing.

This is now support-case SR 3-6270525721: "MySQL 5.6.7-rc1, grants, replication and GTID cause problems" and will also soon have a bug number. And, no, fixing the mysql_system_tables_data.sql is not really removing the problem here.

House and Heisenberg having Replication Delay

So I am getting a mail with a complaint about rising replication delays in a certain replication hierarchy.

Not good, because said hierarchy is one of the important ones. As in 'If that breaks, people are sleeping under the bridge'-important.

The theory was that the change rate in that hierarchy is too high for the single threaded nature of MySQL replication. That was supported by the observation that all affected boxes had no local datadir, but were filer clients. Filer clients as slaves are dying first because the SAN introduces communication latencies that local disks don't have, and the single threaded nature of replication is not helpful here, either. Filers are better when it comes to concurrent accesses, really.

So if that theory would hold that would really ruin my day. Make that month: Said hierarchy is just now recovering from severe refactoring surgery and should have almost no exploitable technical debt that can be leveraged for short term scaling and tuning. If that thing really accumulates delay we are in serious trouble.

Now, I am used to lies. People lie. Boxes lie. So let's fire up the Graphite and have a look at how bad things are. I am choosing a random host from the supposedly sick hierarchy:

Continue reading "House and Heisenberg having Replication Delay"

How large can a MySQL database become?

In Maximum MySQL Database Size? Nick Duncan wants to find out what the maximum size of his MySQL database can possibly be. He answers that with a list of maximum file sizes per file system type. That is not a useful answer.

While every file system does have a maximum file size, this limitation is usually not relevant when it comes to MySQL maximum database size. But let's start with file systems, anyway.

First: You never want to run a database system on a FAT filesystem, ever. In FAT, a file is a linked list of blocks in the FAT. That is, certain "seek" (backwards seek operations) operations become slower the larger a file is, because the file system has to position the file pointer by traversing the linked list of blocks in the FAT. Since seek operations are basically what a large database does all day, FAT is completely useless for this. So the actual file size limit of FAT is kind of moot for the purpose of this discussion.

Second: You also never want to run a database system on a 32 bit operating system. Not only does that limit your file size in Windows and also in certain ways in Linux, it will also limit the amount of system memory you can invest into MySQL buffer caches. That's kind of useless, because it is memory that makes databases fast.

In 64 bit systems and with modern file systems (NTFS in Windows and XFS on LVM2 in Linux, on a recent kernel), the operating imposed file size limit is multiple terabytes or petabytes, even. We will soon see that the exact number is not really relevant.

Continue reading "How large can a MySQL database become?"

MySQL is destroying my Umlauts

This is the english translation of an article in my german blog. This article, like the german original, is licensed CC-BY-SA. The english translation has been kindly provided by Tobias Klausmann.

Recently, I had to explain this to several people, hence a writeup for the blog for easier reference. The question:
I have content in my database that can be sucessfully read and written by my application, but if I do a mysqldump to transfer the data to a new system, all the non-ASCII characters like Umlauts are destroyed.
This happens if you save data to a DB with the wrong text encoding label.

In MySQL, every string has a label that describes the character encoding the string was written in (and should be interpreted in). The string _latin1"Köhntopp" thus (hopefully) is the character sequence K-0xF6-hntopp and the string _utf8"Köhntopp" consequently should be K-0xC3 0xB6-hntopp. Problems arise as soon as the label (_latin1 or _utf8) does not match the encoding inside the string (0xF6 vs. 0xC3 0xB6).

This is outlined in more detail in Handling character sets, and you should have read that article before you continue.

Continue reading "MySQL is destroying my Umlauts"

pam modules for MySQL: What is wrong with these people?

Percona just released their MySQL PAM Authentication insanity, just as Oracle did before, for MySQL 5.5 and MariaDB is no better.

The Oracle module requires a module to be loaded into your client, which is done automatically if the module is present and the server supports PAM auth. The module is called ominously "mysql_clear_password" and does what it says on the tin: Your database server access password is henceforth sent from the client to the server in clear, not encrypted, hashed, salted or otherwise protected.

I suppose the Percona module does the same, although it is not being mentioned in the docs at all (or at least I have not been able to find it in there). They also openly suggest to run the database server as root, as that is the only way for an in-process PAM auth module to be able to access /etc/shadow.


Does any of you know what SASL is and why it has been invented?

I know it's a pain, but it is there for a reason. Many reasons. saslauthd for example will read your authentication secrets instead of your worker process, because you are unable to write and maintain a secure codebase the size of a database server. And by speaking SASL on the wire and then handing off an authenticated connection to your actual worker code you gain access to a number of integrated mechanisms for communicating passwords in a compatible and secure manner, none of which include clear text passwords on the wire.

Can we please bury these plugins, deeply in the Mariana trench, in a CASTOR, put a warning beacon over the site and then start over, doing it right this time?

Thanks. I knew you would see the light eventually.

Call for best practice: Talking to r/o slaves through a load-balancer

I am looking for people who have a bunch of r/o slaves running, and who are using a load balancer to distribute queries across them.

The typical setup would be a PHP or Perl type of deployment with transient connections which end at the end of the page generation, and where a reconnect is being made at the next request serviced. The connect would go to the load balancer, which will forward it to any suitable database in the pool.

I am looking for people who are actually deploying this, and what strategies they have to cope with potential problems. I also would like to better understand what common problems are they needed to address.

Things I can imagine from the top of my head:

- Slave lag. Slave lag can happen on single boxes due to individual failures (battery on raid controller expires) or many boxes (ALTER TABLE logjams hierarchy). In the latter case boxes cannot be dropped from the load balancer lest you end up with an empty pool.

- Identifying problematic machines and isolating faults. At the moment, problematic machines sending requests are easily identified: We can SHOW PROCESSLIST, see the problem query, and the host and port it is coming from. We can find that, lsof on the offending source machine and see what the process is. With an LB inbetween we do lose this ability, unless we do fearful layer 2 magic at the LB. How do you identify sources of disruption elegantly and find them to take them out?

- What is a good pool size? We can unify any number of cells up to an entire data centers capacity from individual cells into one single supercell, but we think that this may be too big a setup. What are sizing guidelines to be used here?

What else am I missing here?

LOAD DATA INFILE (and mysqldump)

A colleague of mine has been benchmarking mysqldump data load vs. various versions of LOAD DATA INFILE. He created a sample data as a text file with either 100k or 20M rows of five integers each, the first column of which is the pk.
perl -MList::Util=shuffle -e '@k=shuffle(1..20e6);
  for (@k) {
    print $_, "    ", join("    ", map int(rand(1e9)), 0..3), "\n";
}' > loadme_nonpkorder.txt

perl -e 'print ++$i, "    ", join("    ", map int(rand(1e9)), 0..3), "\n" 
  for 1..20e6' > loadme_pkorder.txt

All insertion has been done on empty and new tables. The text files we read at least once before to warm up the OS disk cache. The tables have two non-unique single-column indexes. All happens on a idle-ish DB master with some substantial memory and a NetApp hosting the datadir (via XFS and LVM).

He benchmarked four cases:
  1. Insertion in PK order.
  2. Insertion in PK order, dropping indexes before insertion and re-adding them later.
  3. Insertion in random order.
  4. Insertion in random order, dropping indexes before insertion and re-adding them later.

Summary: The result is not surprising: Both using PK order and dropping/re-adding indexes improves performance considerably. The PK order insertion becomes more and more crucial with a larger dataset (which is not at all surprising if you think about what happens when adding a record to the innodb PK tree).

Continue reading "LOAD DATA INFILE (and mysqldump)"

Fighting the mysqld init script

Today we discovered a particularly subtle way of fucking up a server restart. After a routine configuration change and RPM upgrade, a colleague tried to restart an important master. That failed. The message:
root@master ~]# /etc/init.d/mysql start
Starting MySQLCouldn't find MySQL manager (//bin/mysqlmanag[FAILED]erver (//bin/mysqld_safe)

The colleague tried multiple times, and finally resorted to manually typing a
nohup mysqld_safe ...
into a screen console, which he detached.

That took care of production for now and left us with an investigation. Why is the init script trying to start the MySQL manager?

It is not, and never tried to. What happen?

Continue reading "Fighting the mysqld init script"

Down the dirty road

Ok. So it all begins with somebody who is using INSERT ON DUPLICATE KEY UPDATE. That guy wants to count the number of UPDATE actions that statement has taken, as opposed to INSERT actions.

We could have been using mysql_info() to fetch that information. But instead we rig the UPDATE clause:
root@localhost [kris]> create table t ( 
  id integer unsigned not null primary key, 
  d integer unsigned not null 
) engine = innodb;
Query OK, 0 rows affected (0.16 sec)

root@localhost [kris]> insert into t values ( 1, 1), (2,2), (3,3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

root@localhost [kris]> set @x = 0;
Query OK, 0 rows affected (0.00 sec)

root@localhost [kris]> insert into t values (4,4), (2,1), (3, 1) 
-> on duplicate key update 
-> d= values (d) + 0\* ( @x := @x +1 );
Query OK, 5 rows affected (0.00 sec)
Records: 3  Duplicates: 2  Warnings: 0

root@localhost [kris]> select @x;
| @x   |
|    2 |
1 row in set (0.00 sec)
Wonderful side effects! And this is only the beginning. Continue reading "Down the dirty road"

Getting SQL from a SPAN port

Recently I needed the query stream hitting a very busy master. Normally I would have been using the MySQL Proxy to collect queries, but on a very busy machine the Proxy is as much of a problem as it is part of the solution, so I chose a different approach.

I had a SPAN port configured for the master, which is Ciscospeak for a port on a switch which mirrors all traffic of one or more other ports. I had an otherwise idle machine listening to the SPAN port on a spare network card. That way it is possible to collect traffic to and from the master without any interference with the master.

On the listener box, I had tcpflow collecting data to my master (and only traffic to, not from the master):
tcpflow -i eth1 dst master and port 3306
These tcpflow files now need to be processed into a slow-log like format for further processing. For that I wrote a very simple processor in C after some experimentation with tcpdump and mk-query-digest had been shown as being too slow to keep up.

The processor is called extract_queries and it's souce can be found below. It would be used like so:
# mkdir flow
# cd flow
# tcpflow -i eth1 dst master and port 3306
(wait 1h)
# cd ..
# find flow -print0 | xargs -0 extract_queries -u > slow
# mysqldumpslow -s c slow > stats

The Source: (extract_queries.c)