Getting SQL from a SPAN port
Recently I needed the query stream hitting a very busy master. Normally I would have been using the MySQL Proxy to collect queries, but on a very busy machine the Proxy is as much of a problem as it is part of the solution, so I chose a different approach.
I had a SPAN port configured for the master, which is Ciscospeak for a port on a switch which mirrors all traffic of one or more other ports. I had an otherwise idle machine listening to the SPAN port on a spare network card. That way it is possible to collect traffic to and from the master without any interference with the master.
On the listener box, I had tcpflow collecting data to my master (and only traffic to, not from the master):
The processor is called extract_queries and it's souce can be found below. It would be used like so:
The Source: (extract_queries.c)
I had a SPAN port configured for the master, which is Ciscospeak for a port on a switch which mirrors all traffic of one or more other ports. I had an otherwise idle machine listening to the SPAN port on a spare network card. That way it is possible to collect traffic to and from the master without any interference with the master.
On the listener box, I had tcpflow collecting data to my master (and only traffic to, not from the master):
CODE:
tcpflow -i eth1 dst master and port 3306
These tcpflow files now need to be processed into a slow-log like format for further processing. For that I wrote a very simple processor in C after some experimentation with tcpdump and mk-query-digest had been shown as being too slow to keep up.The processor is called extract_queries and it's souce can be found below. It would be used like so:
CODE:
# mkdir flow
# cd flow
# tcpflow -i eth1 dst master and port 3306
(wait 1h)
(break)
# cd ..
# find flow -print0 | xargs -0 extract_queries -u > slow
# mysqldumpslow -s c slow > stats
# cd flow
# tcpflow -i eth1 dst master and port 3306
(wait 1h)
(break)
# cd ..
# find flow -print0 | xargs -0 extract_queries -u > slow
# mysqldumpslow -s c slow > stats
The Source: (extract_queries.c)
Comments
Display comments as Linear | Threaded
Baron on :
Isotopp on :
The box is question is receiving around 7000 qps. Dumps are generated around 40 times faster than they can be processed. The approach of mk-query-digest is flawed at multiple levels:
1. It is parsing ASCII tcpdump -X output, instead of using libpcap and unpack to decode the protocol.
2. It is doing too many things in one program. It is very hard to understand the internal logic without reading the source. Smaller, one-program-per-problem type of programs would be much easier to optimize and easier to understand.
The entire thing needs to be dissected into a bunch of a half dozen tools, each of which is doing just one thing and fast. And natively, not dumping and parsing ASCII hexdumps as an intermediate format. Preferrably in a way that can utilize multiple cores of a multicore box to crunch stuff.
Baron on :