Skip to content

Connection Scoped State in MySQL

This is the translation of an article from my german language blog. It is not a literal translation, but has been amended and changed a bit to take more recent information into account.

It started out as a discussion within the german language MySQL group in USENET. There the eternal question came up why phpMyAdmin gets no love at all from the helpers and regulars in that group. My answer was:

phpMyAdmin (PMA) like many other GUI tools for MySQL has a number of limitations. For a web tool such as PMA these come from its operating principles and can hardly be changed. But let's start at the beginning:

In MYSQL the connection is a special context or scope for many things. At least the following things are part of the connection scope:
  • Transactions. A disconnect implies a ROLLBACK.
  • Locks. Transactions generate locks with writing statements or SELECT for UPDATE. Table locks are generated by LOCK TABLES. Disconnect releases the locks.
  • The number returned by LAST_INSERT_ID() is cached within the connection context. It is unavailable after a disconnect.
  • Tables created by CREATE TEMPORARY TABLE. They are deleted on disconnect.
  • Prepared Statements and the parsed bytecode of stored routines are kept per connection, even if that is the wrong solution from a performance POV.
  • @-variables. SET @bla = 10 or SELECT @bla := count(*) FROM cookie; are defining variables within the context of a connection. They are lost on disconnect.
  • SESSION-parameters. SET SESSION mysiam_sort_buffer_size = 1024*1024*64 or SET @@session.myisam_sort_buffer_size = 1024*1024*64 are setting configuration parameters with session scope. They are lost on disconnect.
  • Replication specific SET-commands such as SET TIMESTAMP or SET LAST_INSERT_ID can affect the behavior of functions such as SELECT NOW() or SELECT LAST_INSERT_ID().
I am calling all of that connection scoped state.

A client that is disconnecting in an uncontrolled way or that does not report a disconnect properly is defective in the sense that all functionality dependent on connection scoped state is unavailable. The opposite case also exists - a client that reuses existing connections may have connections that have an unclean state, which may or may not be a security risk.
Continue reading "Connection Scoped State in MySQL"

DELETE, innodb_max_purge_lag and a case for PARTITIONS

Where I work, Merlin is an important tool for us and provides a lot of insight that other, more generic monitoring tools do not provide. We love it, and in fact love it such much that we have about 140 database agents reporting into Merlin 2.0 from about 120 different machines. That results in a data influx of about 1.2G a day without using QUAN, and in a data influx of about 6G a day using QUAN on a set of selected machines.

It completely overwhelms the Merlin data purge process, so the merlin database grows out of bounds, which is quite unfortunate because our disk space is in fact very bounded.

The immediate answer to our purge problem was to disable the merlin internal purge and with the kind help of MySQL support to create a script which generates a list of record ids to delete. These ids end up in a number of delete statements with very large WHERE ... IN (...) clauses that do the actual delete.

This is a band-aid fix, which does work in a way, but also has unintended consequences, though. Or, as we use to say around here: 'That also breaks, but in a different and interesting way.' Continue reading "DELETE, innodb_max_purge_lag and a case for PARTITIONS"


If you have a slave, that slave is probably running with the read-only flag set in the mysqld-section of your my.cnf. To be able to write to a read-only slave you need to be the replication SQL_THREAD or have SUPER privilege.

Since 5.0.16, it is still possible to execute CREATE TEMPORARY TABLE on a read-only slave, so CREATE TEMPORARY TABLE privilege also allows you to write to a read-only slave in a limited and controlled way.

If you want to process a lot of data in a temporary table, you are probably creating the temporary table without any indices, then INSERT ... SELECT data into it, and then ALTER TABLE ... ADD INDEX afterwards, because that is usually faster than to insert data into a table with indices. Only that you cannot ALTER TABLE a temporary table, even on a server that is not read-only - in order to run ALTER TABLE on any table, even temporary onces, you need ALTER TABLE privilege which you might not want to give out lightly.

There is no reason at all to check ALTER TABLE privilege for an alter table operation on a temporary table, because that table is visible only to your connection and cannot be shared. It is also deleted when you disconnect. In fact there is no reason at all to check permissions for temporary tables. But it is done.

Because it is done, you can grant ALTER TABLE to a single table that does exist or even to a table that does not yet exist. If you grant ALTER TABLE privilege to an existing table on a read-only server, you cannot alter that table, because the server is read-only. If you then use CREATE TEMPORARY TABLE to create a table with that name, the temporary table will shadow the existing persistent table and for your connection the persistent table will become inaccessible.

The semantics of the GRANT will change, though, and will now apply to the temporary table, which is writeable on a read-only server because it is temporary, and is alterable because of the grant which was not meant to apply to it in the first place. Problem solved: I now can ALTER TABLE my temporary table on the read-only server after I have finished my data load.

All is well? Not!

There are multiple things at work here which I consider broken:
  1. GRANTs are applied to temporary tables. This is not making any sense at all in my book. Temporary tables are connection-local objects and they cannot have grants applied to them which were always referring to persistent objects when they were made.
  2. Temporary tables can shadow persistent tables in the namespace of a connection. Because GRANTS are tied to objects via the objects name and not an objects UUID or another form of truly unique object-identifier, GRANTS can refer to changing objects even when the grant does not change. This feels somehow broken, as in "not properly normalized". Does RENAME TABLE edit grant tables as well? I have to check!
  3. By granting CREATE TEMPORARY TABLE privilege to a user I am allowing that user to shadow any other object within a schema. The temporary table will then pick up any rights granted to the shadowed object for the duration of its lifetime. This cannot be good.

Seven times faster commit speed in Windows?

According to my findings in Bug #31876, MySQL does not commit data to disk in Windows using the same method MS SQL Server and DB/2 are using. The method MySQL uses appears to be seven times slower in pathological scenarios.

The bug report contains a patch - thanks to the MySQL WTF (The Windows Task Force) and the lab provided by the customer for helping me to find that.

Does this work for you? I want to hear about your test results.


Lately, I have had opportunity to evaluate a very large Ruby installation that also was growing very quickly. A lot of the work performed on site has been specific to the site, but other observations are true for the platform no matter what is being done on it. This article is about Ruby On Rails and its interaction with MySQL in general.
Continue reading "Rubyisms"

Replication - now and then

One of the major contributing factors to the success of MySQL is the ease and simplicity of its replication. Read-slaves for scaleout and backup-slaves for noninterrupting backups are the norm in any MySQL installation I have seen in the last two years.

So how does replication work? And how shall it be expanded in the future?

What is available?

The binlog written by MySQL currently logs all statements changing the tablespace. It is a serialization of all tablespace changes. The binlog position, expressed as (binlog name, offset), is a database global timestamp - a timestamp expressed in seconds.fraction does not work for any precision at all, because on a multi-core machine multiple things can happen concurrently.

If you want to make a consistent full backup of the database, the database must not change during the backup. That is, it must be possible to associate one and exactly one binlog position with the backup.

In fact, if you have such a backup - one associated with a binlog position - and you happen to have the binlogs from that time until now, it is possible to do a point-in-time (PIT) recovery. You'd recover from the full backup and you'd then replay the binlog from the backups binlog position until now. That is why it is important to store the binlog in a filesystem that fails independently from the rest of your MySQL. That's also why you must not filter the binlog that is written by MySQL using binlog-do-db and binlog-ignore-db - if you do, you'll get an incomplete binlog that will fail to be useful in a PIT recovery scenario.

A slave in MySQL is now nothing but a binlog downloader and executor: The slave must be restored from a PIT-capable full backup. It is then being told the current binlog position and where to log in to get the missing binlog. The slaves IO_THREAD will then log into the master server and download the binlog to the local disk as fast as possible, storing it as the relay log. The slaves SQL_THREAD will then start to execute the relay log as fast as possible. Replication can thus be thought of as an ongoing live recovery.

Continue reading "Replication - now and then"

Innodb cache preloading using blackhole

In MyISAM, we do have LOAD INDEX INTO CACHE. In InnoDB this does not work. For benchmarking I often require a way to preload the innodb_buffer_pool with the primary key and data after a server restart to shorten warmup phases.

According to Blackhole Specialist Kai, the following should work:
mysql> create table t like innodbtable;
mysql> alter table t engine = blackhole;
mysql> insert into t select * from innodbtable;
Another win for the unbreakable BLACKHOLE storage engine.

Changing everything

This article does not even contain the words database or MySQL. I still believe it is somewhat interesting.

Mail has, for some reason, always been playing a big role in my life. I have been running mail for two, my girlfriend and me, in 1988. I have been running mail for 20 and 200 people in 1992, setting up a citizens network. Later I designed and built mail systems for 2 000 and 20 000 person corporations, and planned mail server clusters for 200 000 and 2 million users. And just before I became a consultant at MySQL I was working for a shop that did mail for a living for 20 million users.

Mail is a very simple and well defined collection of services. You accept incoming messages to local users, you implement relaying for your local users with POP-before-SMTP and SMTP AUTH, you build POP, IMAP and webmail accesses, and you deploy spam filter systems and virus scanners for incoming and outgoing messages. This services collection does hardly change when you go from 2 to 20 million users – maybe the larger systems will also provide additional services such as portal services, a news server or other more directed stuff, but that is just fluff outside of the scope of the mail system. The solutions, though, are very different, and very much dependent on the scale of your operations.
Continue reading "Changing everything"

Sysadmins Nightly Mental Pain (SNMP)

Dear Diary, today I ventured into one of the darkest realms of the sysadmin profession: I started playing with SNMP. My goal was very simple and quite clearly stated: Make the output of "SHOW GLOBAL STATUS" available to a SNMP client. One would think that this is a reasonable and easily fulfilled wish.

Little did I know of the madness and despair that linger in the depths which are guarded by the dread named ASN.1 and where the noxious fumes of the organisation no longer called CCITT can still strongly be smelled.

But let us begin this story at the beginning - with a clean install of Suse Linux 10.0 and my trusty apt4rpm and me. It was my thought that the perl support of net-snmp might me handy to get me where I wanted to me: perl is easily the more convenient language for prototyping that C or C++ and getting to the data source using perl-DBI would be pretty easy. So "apt install net-snmp net-snmp-devel perl-SNMP" it was. So with my spirits high and a song on my iPod I ventured out into the documentation...

Continue reading "Sysadmins Nightly Mental Pain (SNMP)"

Notes on VM

Even when it is being repeated once more it is not true:
Stripping binaries using the ‘strip’ utility can also significantly reduce the memory footprint of the application
claims John Coggeshall.

While it is true that a file is smaller on disk after a strip, a quick run of "size" on a binary will show you that the actual binary part of the file is unchanged. Let's have a quick look at /proc/pid/maps to understand what happens.
Continue reading "Notes on VM"

Using oprofile

Oprofile is a profiling tool that requires no instrumentation and can profile an entire system. Binaries with symbol tables (-g compiled binaries) and an uncompressed Kernel with symbol table are helpful.

Oprofile is capable of monitoring and analyzing a running -g compiled mysqld and can tell you which functions are using the most CPU time.

Sample output:
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %        image name               symbol name
197404   22.4460            memcpy
165764   18.8484  no-vmlinux               (no symbols)
76051     8.6475  mysqld.debug             _mi_rec_unpack
30059     3.4179  mysqld.debug             Query_cache::insert_into_
           (Query_cache_block*, Query_cache_block**)
17468     1.9862  mysqld.debug             get_hash_link
16767     1.9065  mysqld.debug             ha_key_cmp
14106     1.6039  mysqld.debug             MYSQLparse(void*)

Continue reading "Using oprofile"

Handling character sets

Choose any font you like Colleague Markus asked me:
Because ... we are now changing dedicated server providers, I have already updated the Gentoo base system. Since as we are already changing everything, we might as well now migrate from MySQL 4 to 5.

So my question is: Shall I be switching from latin1 to utf8? Or should I be staying with latin1?
I have been writing a german language article on character sets in MySQL and this is the translation of it.
Continue reading "Handling character sets"


In Semi-Dynamic Data, Sheeri writes about Semi-Dynamic Data and content pregeneration. In her article, she suggests that for rarely changing data it is often adviseable to precompute the result pages and store them as static content. Sheeri is right: Nothing beats static content, not for speed and neither for reliability. But pregenerated pages can be a waste of system ressources when the number of possible pages is very large, or if most of the pregenerated pages are never hit.

An intermediate scenario may be a statification system and some clever caching logic.

Statification is the process of putting your content generation code into a 404 page handler and have that handler generate requested content. The idea is that on a second request the content will be there and thus a static file is being served with a 200 OK, using the fast path of the web server.

A typical example for this kind of task would be a script that generates its name as a PNG: By requesting, we are returning a PNG image that contains the text "example" - the number of potential PNGs is infinite, and we cannot possibly precalculate all of them. But a few PNGs are requested over and over - most likely because we are referencing them via some IMG tags in some HTML pages. It would be pointless to generate them again and again for each requester, because these images never change.

Here is how to do it, using PHP:
Continue reading "Statification"

Dealing with failure - the key to scaleout

Scaling Patterns
This is a translation of a german language article I wrote two weeks ago for my german language blog.

In 2004, when I was still working for, I gave a little talk on Scaleout on Linuxtag. Even back then one major message of the talk was "Every read problem is a cache problem" and "Every write problem is a problem of distribution and batching":

To scale, you have to partition your application into smaller subsystems and replicate the data. The replication should be asynchronous, not using two phase commit (2pc), or the gain will be smaller to nothing. Writes must be delayed and batched, so they can be handled more efficiently. To avoid bottlenecks, data storage should be decentralized and a central database should be avoided.

That is often contradictory to what traditional database classes at university teach: It is insecure, you are working with false or outdated data and you don't even know at all time if everything worked out the way you imagined it should have.

It has an important advantage: It works. And it scales.

Continue reading "Dealing with failure - the key to scaleout"