Skip to content

Fighting the mysqld init script

Today we discovered a particularly subtle way of fucking up a server restart. After a routine configuration change and RPM upgrade, a colleague tried to restart an important master. That failed. The message:
CODE:
root@master ~]# /etc/init.d/mysql start
Starting MySQLCouldn't find MySQL manager (//bin/mysqlmanag[FAILED]erver (//bin/mysqld_safe)

The colleague tried multiple times, and finally resorted to manually typing a
CODE:
nohup mysqld_safe ...
into a screen console, which he detached.

That took care of production for now and left us with an investigation. Why is the init script trying to start the MySQL manager?

It is not, and never tried to. What happen?

To debug, I copied /etc/init.d/mysql to another location and instrumented a shell debugger. :-) That is, I changed the following lines:
CODE:
    287 case "$mode" in
    288   'start')
    289     # Start daemon
...
    303     echo $echo_n "Starting MySQL"
    304     if test -x $manager -a "$use_mysqld_safe" = "0"
    305     then
...
    311       # Give extra arguments to mysqld with the my.cnf file. This script may
    312       # be overwritten at next upgrade.
    313 echo      "$manager" \
    314         --mysqld-safe-compatible \
    315         --user="$user" \
    316         --pid-file="$pid_file" >/dev/null 2>&1 &
    317 set > x
    318 exit
    319       wait_for_pid created $!; return_value=$?
...
    327     elif test -x $bindir/mysqld_safe
    328     then
    329       # Give extra arguments to mysqld with the my.cnf file. This script
    330       # may be overwritten at next upgrade.
    331       pid_file=$server_pid_file
    332 echo      $bindir/mysqld_safe --datadir=$datadir --pid-file=$server_pid_file $other_args >/dev/null 2>&1 &
    333 set > x
    334 exit
    335       wait_for_pid created $!; return_value=$?

In english: I let the script echo the command instead of executing it, and then I save all shell variables to a file and exit.

Looking at the variable dump, I can see that
CODE:
use_mysqld_safe=1

So it is proven that the script never tried to actually execute the manager. In fact, the actual error message is
CODE:
      log_failure_msg "Couldn't find MySQL manager ($manager) or server ($bindir/mysqld_safe)"

It is the red-colored [FAILURE] printed on top of that by the operating system sugar coating for the boot process that obscures the message and makes things fail.

The error message is neither unclear nor incomplete by itself. The only way it could possibly be cleaner would be
CODE:
if [ "$use_mysqld_safe" = 0 ]
then
  log_failure_msg "Couldn't find server ($bindir/mysqld_safe)"
else
  log_failure_msg "Couldn't find MySQL manager ($bindir/mysqlmanager)"
fi

So the real error is that it cannot find //bin/mysqld_safe, which is pretty obvious, because it is /usr/bin/mysqld_safe. $bindir has the wrong content.

How so?

Here is how it is set:
CODE:
if test -z "$basedir"
then
  basedir=/
  bindir=/usr/bin
  if test -z "$datadir"
  then
    datadir=/var/lib/mysql
  fi
  sbindir=/usr/sbin
  libexecdir=/usr/sbin
else
  bindir="$basedir/bin"
  if test -z "$datadir"
  then
    datadir="$basedir/data"
  fi
  sbindir="$basedir/sbin"
  libexecdir="$basedir/libexec"
fi
This code checks if $basedir is forced ('test -z "$basedir') and if it is NOT, it assumes FHS layout with the tools in /usr/bin where they belong. If $basedir is forced, it assumes instead what I tend to call /opt layout, with the tools in /opt/bin if for example $basedir were set to /opt.

Also, from the error message with the double-/ ("//bin/mysqld_safe") we now know that basedir is forced, and set to /.

And lo and behold:
CODE:
[root@master ~]# pwd
/root
[root@master ~]# cat .my.cnf
[client]
socket=/mysql/master/data/mysql.sock
password=...

[mysqld]
basedir=/
When did that happen? Well, git will know.
CODE:
[root@master root]# git log -p .my.cnf 
commit 306254afee7293887a85884ecaaff9b2d13bb9af
Author: store_local_config <dba.cron>
Date:   Mon Nov 8 14:03:11 2010 +0100

    Changes committed by store_local_config

diff --git a/root/.my.cnf b/root/.my.cnf
index e871dad..6551caa 100644
--- a/root/.my.cnf
+++ b/root/.my.cnf
@@ -1,3 +1,6 @@
 [client]
 socket=/mysql/master/data/mysql.sock
 password=...
+
+[mysqld]
+basedir=/
So at least we know when this happened, but not the who and why.

There are several things to be learned here:
  • The start/stop script picks up server options from the local users .my.cnf files. This feels deeply dangerous and wrong to me.
  • There is no internal state inside the script that marks the used layout type as either FHS or /opt. It is impossible later in the script to decide what happened in the original $basedir test and print that for debug.
  • Operating system boot message decoration can really bite you in the ass.
  • Obsessive-compulsive configuration change tracking is a good thing, expecially on a master.


Oh. And we removed the [mysqld]-section from the .my.cnf file in root's home.

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

Luciano Rocha on :

For debugging, -x is a good option:
CODE:
bash -x /etc/init.d/mysql start 

Kristian Köhntopp on :

Yep. That is what I did with the modified script, running it as
CODE:
bash -xv ~kris/mysql > log 2>&amp;1

That left me with an annotated execution trace in ~kris/log and a variable dump at the exit point in ~kris/x, and ultimately let to the solution that is this writeup.

Mike on :

Another surprise is that despite mysqld --verbose --help's claim that "Default options are read from the following files in the given order:" options will be read from datadir/my.cnf:

http://dev.mysql.com/doc/refman/5.1/en/option-files.html

Simon J Mudd on :

You beat me to blogging about this rather frustrating issue, but thanks for writing up on it. Certainly a gotcha which is hard to spot "quickly".

So I've written a bug report about it which you can find here:

http://bugs.mysql.com/bug.php?id=60729

Let's hope that this can be addressed by better logging (in the script) and by the server and of course better change control by the DBAs.... :-)

Andrii on :

Not sure why reading options from ~/.my.cnf is wrong?
If user has permission to start mysqld, then it has right to define custom configuration for it. (i.e. user-specific)
Perhaps we have huge server box with multiple users who start multiple mysqld instances on it.

Kristian Köhntopp on :

mysqld picks up options from many sections found in many configuration files, happily overwriting previous values with later found values, only when it is not (for example with replication filters, which instead add to each other). The resulting configuration is random, extremely hard to debug and generally a pain in the ass.

The option --defaults-file exists, and can be used to maintain multiple instances in a clear and unambigous way.

In fact, I happen to believe that the following would be a great improvement: a server that refuses to start UNLESS you give it a --defaults-file option and which then will read server configuration from that one file from one single section, and which will complain about duplicate keywords with an error that prevents startup.

Kristian Köhntopp on :

Also, that configuration file I'd like to keep in $datadir, so that it is part of the backup by default (restarting InnoDB without restating the exact values it already knows, for example innodb_log_file_size, is really hard).

And I'd like the configuration of each server in a separate file - the way mysqlmanager handles this is completely broken: all servers running under the same user id, configurations of multiple servers mixed up in one file, and an inheritance schema for configuration options so that you can neither backup this properly nor can you migrate one instance elsewhere easily. Somebody has been adding a lot of code to make my life harder here.

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

BBCode format allowed