MySQL 5.6.4-m7 is made of pure win
I have an internal project that is generating a materialized view of some pretty important data, using 96 application server cores against a 12 core database with 192g of memory and a buttload of SSD, good for about 250 MB/sec peak write rate in synthetic file creation.
The project was underperforming, even with MySQL 5.5-current. We never had more than 15mb/s sustained write rate, and never more than 24G of dirty pages in innodb. Looking at the appropriate (self-made) merlin graphs showed redo log contention and excessive checkpointing.
Install MySQL 5.6.4-m7, which can do large redo logs. A quick back-of-the-napkin calculation shows that around 32GB of redo log are just fine for our worklog, most likely. So I configure 2 ib_logfile of 16000M each, and we turn on the load.
The result is a write rate of 100mb/sec sustained, until the redo log reaches about 24G in size. Then queries stall, the internal stall protection kicks in and backs off for a minute or two, and the database checkpoints furiously. The load comes back on after the timeout, and the database stabilizes nicely at 24GB active redo log and with ongoing checkpointing, 150 MB/sec aggregated write rate and a load of 11 (at 12 cores, that is just dandy).
Since nothing exciting happens, I kill -9 the server, hard, with 24G active redo log. I get recovery after 37 minutes, with no data lost.
This is completely made out of win. I am really, really, really looking forward for the 5.6 GA release. Well done, Oracle-MySQL!
(Note: Large Redo logs are also available with Percona XtraDB and MariaDB)
The project was underperforming, even with MySQL 5.5-current. We never had more than 15mb/s sustained write rate, and never more than 24G of dirty pages in innodb. Looking at the appropriate (self-made) merlin graphs showed redo log contention and excessive checkpointing.
Install MySQL 5.6.4-m7, which can do large redo logs. A quick back-of-the-napkin calculation shows that around 32GB of redo log are just fine for our worklog, most likely. So I configure 2 ib_logfile of 16000M each, and we turn on the load.
The result is a write rate of 100mb/sec sustained, until the redo log reaches about 24G in size. Then queries stall, the internal stall protection kicks in and backs off for a minute or two, and the database checkpoints furiously. The load comes back on after the timeout, and the database stabilizes nicely at 24GB active redo log and with ongoing checkpointing, 150 MB/sec aggregated write rate and a load of 11 (at 12 cores, that is just dandy).
Since nothing exciting happens, I kill -9 the server, hard, with 24G active redo log. I get recovery after 37 minutes, with no data lost.
CODE:
120327 18:18:19 mysqld_safe Starting mysqld_using_numactl daemon with databases from /mysql/fav/data
120327 18:18:19 [Note] Plugin 'FEDERATED' is disabled.
...
120327 18:18:29 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 230713314587
120327 18:18:30 InnoDB: Database was not shut down normally!
...
InnoDB: Doing recovery: scanned up to log sequence number 254522008304
InnoDB: 2 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 668 row operations to undo
InnoDB: Trx id counter is 7581696
120327 18:28:23 InnoDB: Starting an apply batch of log records to the database...
...
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 500411015, file name ../log/binlog.000089
120327 18:55:24InnoDB: Starting in background the rollback of uncommitted transactions
InnoDB: 128 rollback segment(s) are active.
120327 18:55:24 InnoDB: Rolling back trx with id 7581431, 238 rows to undo
120327 18:55:25 InnoDB: Waiting for the background threads to start
InnoDB: Rolling back of trx id 7581431 completed
120327 18:55:25 InnoDB: Rolling back trx with id 7581430, 430 rows to undo
InnoDB: Rolling back of trx id 7581430 completed
120327 18:55:25 InnoDB: Rollback of non-prepared transactions completed
120327 18:55:26 InnoDB: 1.2.4 started; log sequence number 254522008304
...
120327 18:55:30 [Note] /usr/sbin/mysqld: ready for connections.
120327 18:18:19 [Note] Plugin 'FEDERATED' is disabled.
...
120327 18:18:29 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 230713314587
120327 18:18:30 InnoDB: Database was not shut down normally!
...
InnoDB: Doing recovery: scanned up to log sequence number 254522008304
InnoDB: 2 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 668 row operations to undo
InnoDB: Trx id counter is 7581696
120327 18:28:23 InnoDB: Starting an apply batch of log records to the database...
...
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 500411015, file name ../log/binlog.000089
120327 18:55:24InnoDB: Starting in background the rollback of uncommitted transactions
InnoDB: 128 rollback segment(s) are active.
120327 18:55:24 InnoDB: Rolling back trx with id 7581431, 238 rows to undo
120327 18:55:25 InnoDB: Waiting for the background threads to start
InnoDB: Rolling back of trx id 7581431 completed
120327 18:55:25 InnoDB: Rolling back trx with id 7581430, 430 rows to undo
InnoDB: Rolling back of trx id 7581430 completed
120327 18:55:25 InnoDB: Rollback of non-prepared transactions completed
120327 18:55:26 InnoDB: 1.2.4 started; log sequence number 254522008304
...
120327 18:55:30 [Note] /usr/sbin/mysqld: ready for connections.
This is completely made out of win. I am really, really, really looking forward for the 5.6 GA release. Well done, Oracle-MySQL!
(Note: Large Redo logs are also available with Percona XtraDB and MariaDB)

96 appserver cores running 192 connections against my 12-core database server. Short stall at 17:52, recovery and load comes back on at 17:56 or so.

CPU utilization at just 50%, 12 cores with HT on - I want to stay out of the HT zone, so I try to stay below 50%.

Dirty pages going up to 55G, when checkpointing comes on due to redo log pressure, shortly going down because after the stall the appservers back off for a few minutes. Afterwards: stability.

We get a write rate of 100 MB/sec, until the checkpointing comes on. After the stall, accumulated writes + checkpointing do around 150 MB/sec write rate. The disk subsystem was seen as being capable of doing 250 MB/sec in raw file creation.

We start benchmarking with an ice cold buffer pool, warming up during our benchmark. Even so the performance is very satisfactory.

The all-important checkpoint age graph: Redo log grows up to 24G, checkpointing kicks in (creating a short initial stall). Afterwards, checkpoint age stabilizes, as the database is capable of handling write load and checkpointing at the same time.

Apart from the stall, the concurrency level inside the InnoDB entry lock stays sane, which is good.

12 real cores, target load 11 for good utilization.
Comments
Display comments as Linear | Threaded
Henrik Ingo on :
When you say "internal stall protection", is that something in InnoDB or in your app?
Can you elaborate on the back-of-napkin calculation you use to calculate redo log size?
Kristian Köhntopp on :
The back-of-napkin calculation looked at the amount of dirty pages desired (maximum would be innodb-buffer-pool scaled with max-dirty-pct) and the ratio of actual dirty pages observed in relation to actual redo log usage observed. In the application and for this type of workload, that seemed to be relatively stable at a ratio of 1:6 to 1:8.
So in our environment we wanted something in the range of 192 GB RAM (essentially all of it being buffer pool) divided by a ratio of 6 (or 8) = 32G (or 24G) of redo log. The actual value does not really matter, as the ratio is not overly stable, but we wanted to be able to hit the right order of magnitude and get the first digit right.
There is little value in tuning this too much: Running at 100% efficiency just means you have zero elasticity.