Skip to content

Make me a MEM replication delay screen

"List me all databases that have a current replication delay of more than 10 seconds."
"Easy. Let's fetch the data from Merlin."

And that is how it started.

The mem schema has a table inventory_attributes, which decodes reported attribute names into attribute_ids:
CODE:
mysql> select * from inventory_attributes limit 10;
+--------------+---------+-----------------------+----------------+--------------+
| attribute_id | type_id | attribute_name        | attribute_type | insert_count |
+--------------+---------+-----------------------+----------------+--------------+
|            1 |       1 | agent.reachable       | string         |            0 | 
|            2 |       1 | host_id               | string         |            0 | 
|            3 |       1 | name                  | string         |            0 | 
|            4 |       1 | version               | string         |            0 | 
|            5 |       2 | blackout              | string         |            0 | 
|            6 |       2 | displayname           | string         |            0 | 
|            7 |       2 | registration-complete | string         |            0 | 
|            8 |       2 | repl.groupName        | string         |            0 | 
|            9 |       2 | transport             | string         |            0 | 
|           10 |       2 | visible.displayname   | string         |            0 | 
+--------------+---------+-----------------------+----------------+--------------+
10 rows in set (0.00 sec)


There is an attribute with the name we want:

CODE:
mysql> select * from inventory_attributes where attribute_name like '%seconds_beh%';
+--------------+---------+-----------------------+----------------+--------------+
| attribute_id | type_id | attribute_name        | attribute_type | insert_count |
+--------------+---------+-----------------------+----------------+--------------+
|          985 |      12 | Seconds_Behind_Master | long           |            0 | 
+--------------+---------+-----------------------+----------------+--------------+
1 row in set (0.00 sec)


It is type 12. What does that mean?

CODE:
mysql> select * from inventory_types where type_id = 12;
+---------+-------------+--------------+--------------+--------------------+
| type_id | type_name   | namespace_id | insert_count | type_discriminator |
+---------+-------------+--------------+--------------+--------------------+
|      12 | SlaveStatus |            1 |            0 | mysql.SlaveStatus  | 
+---------+-------------+--------------+--------------+--------------------+
1 row in set (0.00 sec)


and all attributes from the type_id = 12 group are indeed the output of SHOW SLAVE STATUS and friends:

CODE:
mysql> select attribute_id, attribute_name, attribute_type from inventory_attributes where type_id = 12;
+--------------+-------------------------------+----------------+
| attribute_id | attribute_name                | attribute_type |
+--------------+-------------------------------+----------------+
|          955 | Binlog_Do_DB                  | string         | 
|          956 | Binlog_Ignore_DB              | string         | 
...
|          985 | Seconds_Behind_Master         | long           | 
...
|          991 | Until_Log_File                | string         | 
|          992 | Until_Log_Pos                 | long           | 
+--------------+-------------------------------+----------------+
46 rows in set (0.00 sec)


We can then grab data we want from inventory_instance_attributes easily.

Interestingly the time values here are milliseconds since epoch, and we do find the current values in this table (and also in a much more complicatedly accessible fact table, but since we want current values, we are fine here):

CODE:
mysql> select instance_attribute_id, instance_id, from_unixtime(latest_save_time/1000) as ts, latest_long_value as delay from inventory_instance_attributes where attribute_id = 985 and latest_long_value > 10 order by ts desc;
+-----------------------+-------------+---------------------+-------+
| instance_attribute_id | instance_id | ts                  | delay |
+-----------------------+-------------+---------------------+-------+
|              10703849 |     5257758 | 2011-07-20 15:11:02 |  2682 | 
|              10621167 |     5246743 | 2011-07-20 15:11:02 |  1477 | 
|               9279503 |     4854163 | 2011-07-20 15:11:01 |   179 | 
|               9464838 |     4884784 | 2011-07-20 15:11:01 | 55767 | 
|              10489130 |     5229908 | 2011-07-20 15:11:00 |   104 | 
|               9725056 |     4915393 | 2011-07-20 15:11:00 |  6075 | 
|               2466182 |     1639188 | 2011-07-19 20:59:05 |  1826 | 
|              10554251 |     5237926 | 2011-07-08 15:01:01 | 56118 | 
|              10529196 |     5234991 | 2011-07-07 14:52:03 |  8429 | 
|               9293669 |     4855579 | 2011-07-07 11:29:00 |  3257 | 
|               9281958 |     4854393 | 2011-07-07 11:24:02 |  3003 | 
|               4720653 |     2818144 | 2011-07-05 14:04:02 | 40525 | 
|              10402648 |     5219719 | 2011-06-29 16:01:02 | 52717 | 
|              10261664 |     5145134 | 2011-06-11 13:19:01 |  5770 | 
|               9461829 |     4884601 | 2011-05-31 16:46:01 |   588 | 
|               8533129 |     4526662 | 2011-04-05 23:48:41 |   669 | 
+-----------------------+-------------+---------------------+-------+
16 rows in set (0.00 sec)


We do see that some databases have not been reporting in for a long time (check ts). We might want to limit ourselves to recent data (say, the last hour):

CODE:
mysql> select instance_attribute_id, instance_id, from_unixtime(latest_save_time/1000) as ts, latest_long_value as delay from inventory_instance_attributes where attribute_id = 985 and latest_long_value > 10 and latest_save_time > unix_timestamp(now()-interval 1 hour)*1000 order by delay desc;
+-----------------------+-------------+---------------------+-------+
| instance_attribute_id | instance_id | ts                  | delay |
+-----------------------+-------------+---------------------+-------+
|               9464838 |     4884784 | 2011-07-20 15:12:01 | 55787 | 
|               9725056 |     4915393 | 2011-07-20 15:12:00 |  5820 | 
|              10703849 |     5257758 | 2011-07-20 15:12:02 |  2729 | 
|              10621167 |     5246743 | 2011-07-20 15:12:02 |  1530 | 
|              10489130 |     5229908 | 2011-07-20 15:12:00 |    44 | 
+-----------------------+-------------+---------------------+-------+
5 rows in set (0.00 sec)


Now if we only had the host names of these databases, that would be really useful.

Unfortunately, an instance_id is not for a database instance - an instance in merlin is a reporting set of data, so it is assigned per database per type. We need to check out the inventory_instances to turn a pair of instance_id, type_id into a different instance_id, type_id pair for the same host.

CODE:
mysql> select * from inventory_instances where instance_id = 4884784;
+-------------+--------------------------------------+---------+-----------+--------------+-----------------------+
| instance_id | instance_name                        | type_id | parent_id | insert_count | inv_obj_discriminator |
+-------------+--------------------------------------+---------+-----------+--------------+-----------------------+
|     4884784 | f044d6c1-a6af-4279-8229-731fb89ba13a |      12 |   4884761 |            0 | mysql.SlaveStatus     | 
+-------------+--------------------------------------+---------+-----------+--------------+-----------------------+
1 row in set (0.00 sec)


CODE:
mysql> select instance_id, type_id from inventory_instances where instance_name = 'f044d6c1-a6af-4279-8229-731fb89ba13a';
+-------------+---------+
| instance_id | type_id |
+-------------+---------+
|     4884761 |       2 | 
|     4884791 |       3 | 
|     4884792 |      10 | 
|     4884784 |      12 | 
...


So the join needed goes from inventory_instance_attributes (attribute 985, seconds behind master) through inventory_instances for the instance_id to inventory_instances on instance_name on to inventory_instance_attributes again, for attribute 11 (type_id 3) for the hostname.

And indeed a quick and simple query does the trick:

CODE:
mysql> select 
   ->   from_unixtime(a.latest_save_time/1000) as ts, 
   ->   a.latest_long_value as delay, 
   ->   b.latest_string_value as hostname
   -> from 
   ->   inventory_instance_attributes as a join
   ->   inventory_instances as ii1 on a.instance_id = ii1.instance_id join
   ->   inventory_instances as ii2 on ii1.instance_name = ii2.instance_name and ii1.type_id = 12 and ii2.type_id = 3 join
   ->   inventory_instance_attributes as b on ii2.instance_id = b.instance_id
   -> 
   -> where a.attribute_id = 985 
   ->   and b.attribute_id = 11
   ->   and a.latest_long_value > 10 
   ->   and a.latest_save_time > unix_timestamp(now()-interval 1 hour)*1000
   -> order by 
   ->   delay desc
   -> ;
+---------------------+-------+------------------------------------------+
| ts                  | delay | hostname                                 |
+---------------------+-------+------------------------------------------+
| 2011-07-20 15:27:01 | 56123 | db01resrdb-10     | 
| 2011-07-20 15:27:02 |  2830 | mc01monitorsmdb-01 | 
| 2011-07-20 15:27:01 |  1826 | mc01avbkup-01      | 
| 2011-07-20 15:27:03 |  1494 | mc01dwrdb-01       | 
+---------------------+-------+------------------------------------------+
4 rows in set (0.01 sec)


With a perfect plan:

CODE:
mysql> explain select    from_unixtime(a.latest_save_time/1000) as ts,    a.latest_long_value as delay,    b.latest_string_value as hostname from    inventory_instance_attributes as a join   inventory_instances as ii1 on a.instance_id = ii1.instance_id join   inventory_instances as ii2 on ii1.instance_name = ii2.instance_name and ii1.type_id = 12 and ii2.type_id = 3 join   inventory_instance_attributes as b on ii2.instance_id = b.instance_id  where a.attribute_id = 985    and b.attribute_id = 11   and a.latest_long_value > 10    and a.latest_save_time > unix_timestamp(now()-interval 1 hour)*1000 order by    delay desc\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: a
        type: ref
possible_keys: instance_id,FK87C68BDE9BBF95B9
         key: FK87C68BDE9BBF95B9
     key_len: 4
         ref: const
        rows: 320
       Extra: Using where; Using filesort
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: ii1
        type: eq_ref
possible_keys: PRIMARY,instance_name,FKD4320F5BBDD9C29B
         key: PRIMARY
     key_len: 4
         ref: mem.a.instance_id
        rows: 1
       Extra: Using where
*************************** 3. row ***************************
          id: 1
 select_type: SIMPLE
       table: ii2
        type: eq_ref
possible_keys: PRIMARY,instance_name,FKD4320F5BBDD9C29B
         key: instance_name
     key_len: 771
         ref: mem.ii1.instance_name,const
        rows: 1
       Extra: Using index
*************************** 4. row ***************************
          id: 1
 select_type: SIMPLE
       table: b
        type: eq_ref
possible_keys: instance_id,FK87C68BDE9BBF95B9
         key: instance_id
     key_len: 8
         ref: mem.ii2.instance_id,const
        rows: 1
       Extra: 
4 rows in set (0.00 sec)


We can turn this easily into a web page and put it onto the office control monitor for everybody to see.

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

Henrik Ingo on :

If you're using Merlin, why don't you just create a new graph based on this data?

Kristian Köhntopp on :

Because what I want cannot be graphed: A list of hostnames that need attention, most needy machine first.

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

BBCode format allowed