Skip to content

Getting SQL from a SPAN port

Recently I needed the query stream hitting a very busy master. Normally I would have been using the MySQL Proxy to collect queries, but on a very busy machine the Proxy is as much of a problem as it is part of the solution, so I chose a different approach.

I had a SPAN port configured for the master, which is Ciscospeak for a port on a switch which mirrors all traffic of one or more other ports. I had an otherwise idle machine listening to the SPAN port on a spare network card. That way it is possible to collect traffic to and from the master without any interference with the master.

On the listener box, I had tcpflow collecting data to my master (and only traffic to, not from the master):
tcpflow -i eth1 dst master and port 3306
These tcpflow files now need to be processed into a slow-log like format for further processing. For that I wrote a very simple processor in C after some experimentation with tcpdump and mk-query-digest had been shown as being too slow to keep up.

The processor is called extract_queries and it's souce can be found below. It would be used like so:
# mkdir flow
# cd flow
# tcpflow -i eth1 dst master and port 3306
(wait 1h)
# cd ..
# find flow -print0 | xargs -0 extract_queries -u > slow
# mysqldumpslow -s c slow > stats

The Source: (extract_queries.c)


No Trackbacks


Display comments as Linear | Threaded

Baron on :

Hi Kristian! I am curious whether you ran mk-query-digest with --no-report. That tends to be the most intensive part of its work, and if you're just using it to print out the queries in slow-log format you don't need it.

Isotopp on :

I ran it with and without --no-report, it did not matter much.

The box is question is receiving around 7000 qps. Dumps are generated around 40 times faster than they can be processed. The approach of mk-query-digest is flawed at multiple levels:

1. It is parsing ASCII tcpdump -X output, instead of using libpcap and unpack to decode the protocol.
2. It is doing too many things in one program. It is very hard to understand the internal logic without reading the source. Smaller, one-program-per-problem type of programs would be much easier to optimize and easier to understand.

The entire thing needs to be dissected into a bunch of a half dozen tools, each of which is doing just one thing and fast. And natively, not dumping and parsing ASCII hexdumps as an intermediate format. Preferrably in a way that can utilize multiple cores of a multicore box to crunch stuff.

Baron on :

I think those are strengths for many purposes, but to each his own!

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.

BBCode format allowed