Skip to content

Fighting the mysqld init script

Today we discovered a particularly subtle way of fucking up a server restart. After a routine configuration change and RPM upgrade, a colleague tried to restart an important master. That failed. The message:
CODE:
root@master ~]# /etc/init.d/mysql start
Starting MySQLCouldn't find MySQL manager (//bin/mysqlmanag[FAILED]erver (//bin/mysqld_safe)

The colleague tried multiple times, and finally resorted to manually typing a
CODE:
nohup mysqld_safe ...
into a screen console, which he detached.

That took care of production for now and left us with an investigation. Why is the init script trying to start the MySQL manager?

It is not, and never tried to. What happen?

Continue reading "Fighting the mysqld init script"

Comparing HandlerSocket, DBI, DBI+Handler Command and Class DBI

(This article is based on the work of my colleage Herald van der Breggen. The "I" and "my" below refer to him)

At work, we have a large Perl application running to earn our keep. To read data from MySQL into perl application, there are a few ways to do that, and we wanted to compare performance of these methods.
  • HandlerSocket, a NoSQL interfaceus, using Net::HandlerSocket) (columns selectable)
  • DBI, the standard Perl database interface (using DBD::mysql underneath) (columns selectable).
  • HANDLER DBI, DBI queries, using HANDLER statements instead of SELECT statements. See HANDLER Syntax (full rows only).
  • Class::DBI, an ORM wrapper around DBI, which is what we use in the our internal models a lot (ORM magically selects columns for you, often too many).

As a baseline, in a typical page generation we spend around 0-10ms per page on SQL cpu time in the client, out of 500ms total CPU time. Wall clock times are larger, of course, because of processing time in the database plus data transfer time. Width and number of columns transported matter, and the various interfaces are different with respect to their ability to control the columns transferred to the client.
Continue reading "Comparing HandlerSocket, DBI, DBI+Handler Command and Class DBI"

MySQL Administrators (multiple) in Amsterdam and Cambridge

I am working for this small "not-really-a-travel-agency" in Amsterdam for almost 3 years now (Booking.com is part of the Priceline Group of Companies): We are reserving hotels rooms for guests using a website, and we sleep several hundred thousand people, each night.

To accomplish that we are operating a three-digit number of MySQL database servers on CentOS in about three dozen replication hierarchies, automating system administration as good as possible using puppet. Our developers are abusing these boxes using Apache and mod_perl. Because we are still growing faster than the market, we are looking for multiple MySQL DBAs. The positions are based in Amsterdam, NL and Cambridge, UK.

An applicant should have multiple years of experience with MySQL, with a focus on InnoDB, replication and partitions. Good working knowledge of Linux (CentOS), experience with Puppet or general Linux system administration skills are a useful bonus. Experience with operations in large scale environments is, too.

Booking.com can offer help in relocation and with the administrative and legal formalities that come with working abroad. Also, the Netherlands have an attractive tax model for "Knowledge Workers" which may be applicable for you, and again we might be able to help with that as well.

Company language is english, and we are located in attractive parts of the city centers in the cities where we are present. Your colleagues in IT are coming from at least a dozen countries all over the world. The entire production is completely based on open source products, of course. Booking.com is hosting the perl git repository, and contributes to the development of Perl, Puppet and MariaDB as well as a few other things. Some of your future colleagues are likely to have authored some of the software you are using every day.

Links: DBA Amsterdam (804-006), DBA Cambridge 569-006)

Natural keys vs. auto_increment

The following is an edited transcription from the #mysql on the Freenode IRC Chat. It is here so that it can be referenced by the IRC bot on that channel.

G> So it's not necessarily a bad idea to use say an autoinc as the primary index of a table?

A> It's a very good idea to have an autoinc primary index. See the Manual.

T> A: I tend to prefer the natural primary key if one exists, unless ORM is an issue.

I> For Innodb autoinc is usually preferrable.


Continue reading "Natural keys vs. auto_increment"

Why commercial storage engines do not excite me

In The Doom Of Multiple Storage Engines, PeterZ lists a number of unique technological challenges that MySQL has to face precisely because it chooses Storage Engines and the way it implements them.

There is another, nontechnical challenge that needs solving by vendors that are producing commercial closed source storage engines for MySQL - my experience is with one specific vendor, but the situation is basically the same for each and every closed source SE:

Where I work, we are a MySQL support customer. Actually, we are a very satisfied MySQL support customer, because in our opinion the MySQL support just plain rocks and has more than once saved our corporate asses, or at least greatly improved our MySQL experience.

If we were to load a closed source storage engine into our MySQL binaries, we would make life a lot harder for MySQL support, and despite the fact that MySQL support is actually extremely cooperative when it comes to non-MySQL binaries, we would most likely end up with a support situation that is a lot less stellar than what we have now.

In the case I am thinking about, the vendor does not just produce a storage engine, but according to their architecture diagram, more or less a fork of MySQL including changes to the optimizer and other non-SE parts of MySQL. The version strings of their binaries I have seen have smaller numbers than current MySQL versions, in some cases much smaller numbers down to what MySQL considers Alpha or Beta versions of the 5.1 server.

If there was a problem with that binary and we were to take it to MySQL support, they would most likely laugh us out of the room, and rightly so. If we took it to the storage engine maker and the problem relates to something that is not related to their product, but to the MySQL proper, we are probably not really talking to the right company for a fix. Or we are looking at a lot of potential for corporate finger pointing.

The solution - for us at least - is really to beat down any effort of our internal customers that is related to researching closed source MySQL storage engines, unless they can show us a support agreement or support endorsement for this storage engine by MySQL/Sun/Oracle. Their product may be technologically sound and the solution to all of our performance problems - we will probably not be interested until there is such an agreement. We'd rather introduce another commercial database server (which is neither MySQL nor Oracle) than run such an unsupported configuration.

So the most important thing for any storage engine maker is to be either open source, or be part of a support alliance with MySQL/Sun/Oracle. Without either you do not exist.

Down the dirty road

Ok. So it all begins with somebody who is using INSERT ON DUPLICATE KEY UPDATE. That guy wants to count the number of UPDATE actions that statement has taken, as opposed to INSERT actions.

We could have been using mysql_info() to fetch that information. But instead we rig the UPDATE clause:
CODE:
root@localhost [kris]> create table t ( 
  id integer unsigned not null primary key, 
  d integer unsigned not null 
) engine = innodb;
Query OK, 0 rows affected (0.16 sec)

root@localhost [kris]> insert into t values ( 1, 1), (2,2), (3,3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

root@localhost [kris]> set @x = 0;
Query OK, 0 rows affected (0.00 sec)

root@localhost [kris]> insert into t values (4,4), (2,1), (3, 1) 
-> on duplicate key update 
-> d= values (d) + 0\* ( @x := @x +1 );
Query OK, 5 rows affected (0.00 sec)
Records: 3  Duplicates: 2  Warnings: 0

root@localhost [kris]> select @x;
+------+
| @x   |
+------+
|    2 |
+------+
1 row in set (0.00 sec)
Wonderful side effects! And this is only the beginning. Continue reading "Down the dirty road"

Getting SQL from a SPAN port

Recently I needed the query stream hitting a very busy master. Normally I would have been using the MySQL Proxy to collect queries, but on a very busy machine the Proxy is as much of a problem as it is part of the solution, so I chose a different approach.

I had a SPAN port configured for the master, which is Ciscospeak for a port on a switch which mirrors all traffic of one or more other ports. I had an otherwise idle machine listening to the SPAN port on a spare network card. That way it is possible to collect traffic to and from the master without any interference with the master.

On the listener box, I had tcpflow collecting data to my master (and only traffic to, not from the master):
CODE:
tcpflow -i eth1 dst master and port 3306
These tcpflow files now need to be processed into a slow-log like format for further processing. For that I wrote a very simple processor in C after some experimentation with tcpdump and mk-query-digest had been shown as being too slow to keep up.

The processor is called extract_queries and it's souce can be found below. It would be used like so:
CODE:
# mkdir flow
# cd flow
# tcpflow -i eth1 dst master and port 3306
(wait 1h)
(break)
# cd ..
# find flow -print0 | xargs -0 extract_queries -u > slow
# mysqldumpslow -s c slow > stats


The Source: (extract_queries.c)

Open Source 'Leeches'

Infoworld has an article "The fight over open source 'leeches', about the failure of enterprises to contribute the changes they make to open source software back upstream to the project.
"When it comes to open source communities, individuals are much better citizens than institutions. The enlightened self-interest that causes individuals to send back bug fixes, contribute ideas for new features, and write documentation is much harder to find in institutions," Dan Woods, CTO of Evolved Media, wrote in Forbes earlier this year.
I have a very old article in my german language blog about Open Source Software and companies, and the infoworld article triggered me now to translate it:

Many companies do use open source, and that is a good thing. Most of that software is well suited to the task, and usually it is much easier to understand, more flexible and also easier to debug than their closed source equivalents. But open source requires that enterprises adjust their process with regard to changes and deployment of software in order to be successful open source users.

The fact that you can change open source software lures many companies into making changes to the software. But making the changes is just a small part of the work, and the most inexpensive one.

What usually happens is often this:


Continue reading "Open Source 'Leeches'"

Connection Scoped State in MySQL

This is the translation of an article from my german language blog. It is not a literal translation, but has been amended and changed a bit to take more recent information into account.

It started out as a discussion within the german language MySQL group in USENET. There the eternal question came up why phpMyAdmin gets no love at all from the helpers and regulars in that group. My answer was:

phpMyAdmin (PMA) like many other GUI tools for MySQL has a number of limitations. For a web tool such as PMA these come from its operating principles and can hardly be changed. But let's start at the beginning:

In MYSQL the connection is a special context or scope for many things. At least the following things are part of the connection scope:
  • Transactions. A disconnect implies a ROLLBACK.
  • Locks. Transactions generate locks with writing statements or SELECT for UPDATE. Table locks are generated by LOCK TABLES. Disconnect releases the locks.
  • The number returned by LAST_INSERT_ID() is cached within the connection context. It is unavailable after a disconnect.
  • Tables created by CREATE TEMPORARY TABLE. They are deleted on disconnect.
  • Prepared Statements and the parsed bytecode of stored routines are kept per connection, even if that is the wrong solution from a performance POV.
  • @-variables. SET @bla = 10 or SELECT @bla := count(*) FROM cookie; are defining variables within the context of a connection. They are lost on disconnect.
  • SESSION-parameters. SET SESSION mysiam_sort_buffer_size = 1024*1024*64 or SET @@session.myisam_sort_buffer_size = 1024*1024*64 are setting configuration parameters with session scope. They are lost on disconnect.
  • Replication specific SET-commands such as SET TIMESTAMP or SET LAST_INSERT_ID can affect the behavior of functions such as SELECT NOW() or SELECT LAST_INSERT_ID().
I am calling all of that connection scoped state.

A client that is disconnecting in an uncontrolled way or that does not report a disconnect properly is defective in the sense that all functionality dependent on connection scoped state is unavailable. The opposite case also exists - a client that reuses existing connections may have connections that have an unclean state, which may or may not be a security risk.
Continue reading "Connection Scoped State in MySQL"

MySQL, Sun and Oracle

This is the translation of a text I wrote for my german language blog two days ago.

Now Sun has been bought, and not by IBM or Cisco, but by Oracle. In the aftermath everbody is asking themselves - what happens to MySQL?

Well, firstly MySQL and InnoDB are now part of the same company. Oracle has been maintaining InnoDB pretty well in the past, and that can only improve.

Will Oracle let MySQL die and try to push their on products into the market? Hardly so. Why should Oracle do that, and if so, using what products?


Continue reading "MySQL, Sun and Oracle"

DELETE, innodb_max_purge_lag and a case for PARTITIONS

Where I work, Merlin is an important tool for us and provides a lot of insight that other, more generic monitoring tools do not provide. We love it, and in fact love it such much that we have about 140 database agents reporting into Merlin 2.0 from about 120 different machines. That results in a data influx of about 1.2G a day without using QUAN, and in a data influx of about 6G a day using QUAN on a set of selected machines.

It completely overwhelms the Merlin data purge process, so the merlin database grows out of bounds, which is quite unfortunate because our disk space is in fact very bounded.

The immediate answer to our purge problem was to disable the merlin internal purge and with the kind help of MySQL support to create a script which generates a list of record ids to delete. These ids end up in a number of delete statements with very large WHERE ... IN (...) clauses that do the actual delete.

This is a band-aid fix, which does work in a way, but also has unintended consequences, though. Or, as we use to say around here: 'That also breaks, but in a different and interesting way.' Continue reading "DELETE, innodb_max_purge_lag and a case for PARTITIONS"

CREATE TEMPORARY TABLE

If you have a slave, that slave is probably running with the read-only flag set in the mysqld-section of your my.cnf. To be able to write to a read-only slave you need to be the replication SQL_THREAD or have SUPER privilege.

Since 5.0.16, it is still possible to execute CREATE TEMPORARY TABLE on a read-only slave, so CREATE TEMPORARY TABLE privilege also allows you to write to a read-only slave in a limited and controlled way.

If you want to process a lot of data in a temporary table, you are probably creating the temporary table without any indices, then INSERT ... SELECT data into it, and then ALTER TABLE ... ADD INDEX afterwards, because that is usually faster than to insert data into a table with indices. Only that you cannot ALTER TABLE a temporary table, even on a server that is not read-only - in order to run ALTER TABLE on any table, even temporary onces, you need ALTER TABLE privilege which you might not want to give out lightly.

There is no reason at all to check ALTER TABLE privilege for an alter table operation on a temporary table, because that table is visible only to your connection and cannot be shared. It is also deleted when you disconnect. In fact there is no reason at all to check permissions for temporary tables. But it is done.

Because it is done, you can grant ALTER TABLE to a single table that does exist or even to a table that does not yet exist. If you grant ALTER TABLE privilege to an existing table on a read-only server, you cannot alter that table, because the server is read-only. If you then use CREATE TEMPORARY TABLE to create a table with that name, the temporary table will shadow the existing persistent table and for your connection the persistent table will become inaccessible.

The semantics of the GRANT will change, though, and will now apply to the temporary table, which is writeable on a read-only server because it is temporary, and is alterable because of the grant which was not meant to apply to it in the first place. Problem solved: I now can ALTER TABLE my temporary table on the read-only server after I have finished my data load.

All is well? Not!

There are multiple things at work here which I consider broken:
  1. GRANTs are applied to temporary tables. This is not making any sense at all in my book. Temporary tables are connection-local objects and they cannot have grants applied to them which were always referring to persistent objects when they were made.
  2. Temporary tables can shadow persistent tables in the namespace of a connection. Because GRANTS are tied to objects via the objects name and not an objects UUID or another form of truly unique object-identifier, GRANTS can refer to changing objects even when the grant does not change. This feels somehow broken, as in "not properly normalized". Does RENAME TABLE edit grant tables as well? I have to check!
  3. By granting CREATE TEMPORARY TABLE privilege to a user I am allowing that user to shadow any other object within a schema. The temporary table will then pick up any rights granted to the shadowed object for the duration of its lifetime. This cannot be good.

Configuring InnoDB - An InnoDB tutorial

This is the english translation of another article in my german language blog.

How are transactions organized physically

When InnoDB creates a new transaction it is not yet committed. The database has not yet made any promises to the application and so we do not really have to make anything persistent so far.

To be fast InnoDB tries to assemble the transaction in a memory buffer, the innodb_log_buffer. It should be sufficiently large that you actually can assemble such a transaction in memory without needing to write it out in part into the redo log. A size of 1M to 8M is normal.

Once a transaction is to be committed InnoDB has to read the page from disk which contains the image of the row that is being changed. It then has to actually make that change in memory. The changed page is cached in a memory pool called the innodb_buffer_pool. This pool also caches unchanged pages that have been accessed by read operations. All of these pages on disk and in memory are 16K in size and the innodb_buffer_pool_size determines how much RAM we will use as a cache for such pages - usually as much as we can spare.
Continue reading "Configuring InnoDB - An InnoDB tutorial"