Skip to content

Enemy Action

"Hey, Kris! We got a connection surge on $IMPORTANT_MASTER, and a short activity drop preceding that. All other graphs are flat."



and



I am looking.
All other graphs are more or less normal, indeed. But around 13:20 there is a short full stop for all processing, and then said surge.

Now, we do have log_processlist.pl. So if there is a reason for the locking, I will have logs in /var/log/mysql_pl/Wed/13_2[01]*. Those files are indeed larger, but only a little so: Up to over 400 connections from a regular 250.

And: The logs are clean. Nothing irregular, whatsoever. So either our monitoring has big holes, or this is not a server stall and pileup, but has an external cause.

This is a master. Masters have a datadir on NetApp. I am asking the original client to talk to the Filer people. Maybe they have been playing with their toys at that time. While the original client is walking away, I am checking more logs - it is not that we have a shortage of them.

And indeed:

Oct 31 13:20:10 master kernel: bnx2 0000:03:00.0: eth0: NIC Copper Link is Down
Oct 31 13:20:20 master kernel: bnx2 0000:03:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON


So the Filer people are off the hook. We have instead caught some friendly fire from the data center team. Maybe somebody has been assuming that all boxes are being created equal, because they are looking equal. Let's see if that is the case and educate them about box labelling.

Sometimes the database is not the cause.

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

Kristian Köhntopp on :

My friendly Offtopic Channel in IRC has horror stories about this, it is Halloween, after all.

"Well, I do have seen both ports of a bonding interface fail because somebody faulted both cables simultaneously while doing rack maintenance."

"Shouldn't these two cables end in two different racks? Or do you mean the server's rack? 'I just wanted to swap these two. And then my donut was rolling under the rack.'"

"Nice one: 'The server complains about a faulty PSU, but does not tell which. And both fans are still moving.'"

"Yes, I have seen redunant cabling where Server === Switch. Also, all four PSUs have been plugged into one single power strip."

"Oh, that cancelled change request. Yes, we just decided to do that anyway."

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

BBCode format allowed