Last week I was with a customer and came across a side problem that was of a more general nature. It is the problem of users having a number of things that use up ressources. The things may be mails, recordings, stamps or whatever - for simplicity we assume recordings, and the ressource consumed is size. The system has a quota system in place, which stores a ressource limit per user.
The quota problem is: if a user is over quota, we want to get a number of item ids to delete per user from old to new, until that user is under quota or just one item over quota. This is the quota problem, and I needed a quota query that solves it.
Here is the setup.
In the system there is a very simple user table which holds user ids and quota values. The system also keeps a column called summary as a denormalisation, which is a sum of the ressources consumed by this user.
mysql> show create table user\\G
Table: user
Create Table: CREATE TABLE `user` (
`uid` bigint(20) unsigned NOT NULL auto_increment,
`summary` bigint(20) unsigned NOT NULL,
`quota` bigint(20) unsigned NOT NULL,
UNIQUE KEY `uid` (`uid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql> select * from user;
+-----+---------+-------+
| uid | summary | quota |
+-----+---------+-------+
| 1 | 0 | 50000 |
| 2 | 0 | 15000 |
+-----+---------+-------+
2 rows in set (0.00 sec)
The items themselves are held in a userdata table Each recording has an id, belongs to a user from the user table, has a size and a recording date.
mysql> show create table recording\\G
Table: recording
Create Table: CREATE TABLE `recording` (
`recid` bigint(20) unsigned NOT NULL auto_increment,
`uid` bigint(20) unsigned NOT NULL,
`size` bigint(20) unsigned NOT NULL,
`recdate` datetime NOT NULL,
UNIQUE KEY `recid` (`recid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
root@localhost [kris]> select * from recording;
+-------+-----+-------+---------------------+
| recid | uid | size | recdate |
+-------+-----+-------+---------------------+
| 1 | 1 | 10000 | 2005-12-08 17:52:20 |
| 2 | 1 | 20000 | 2005-12-11 17:52:20 |
| 3 | 1 | 10000 | 2005-12-15 17:52:20 |
| 4 | 2 | 10000 | 2005-12-13 17:53:04 |
| 5 | 2 | 10000 | 2005-12-14 17:53:04 |
+-------+-----+-------+---------------------+
5 rows in set (0.00 sec)
The summaries can be created using a query like this:
mysql> select uid
-> , sum(size) as summary
-> from recording
-> group by uid;
+-----+---------+
| uid | summary |
+-----+---------+
| 1 | 40000 |
| 2 | 20000 |
+-----+---------+
2 rows in set (0.00 sec)
If we want to store them in the user table as a denormalisation, you have to use an update statement similar to the following:
mysql> update user
-> set summary = (
-> select sum(size)
-> from recording
-> where recording.uid = user.uid
-> group by uid
-> );
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2 Changed: 2 Warnings: 0
Check that the values are okay now:
mysql> select * from user;
+-----+---------+-------+
| uid | summary | quota |
+-----+---------+-------+
| 1 | 40000 | 50000 |
| 2 | 20000 | 15000 |
+-----+---------+-------+
2 rows in set (0.00 sec)
Users over quota can be found with a simple comparison of summary and quota now:
mysql> select uid
-> , summary-quota as overquota
-> from user
-> where summary > quota;
+-----+-----------+
| uid | overquota |
+-----+-----------+
| 2 | 5000 |
+-----+-----------+
1 row in set (0.00 sec)
We now want to delete old recordings from the user with uid 2 from old to new until the user is under quota again.
This query lists all recordings from all users over quota.
mysql> select r.recid
-> , r.size
-> , r.recdate
-> , u.uid
-> , u.summary-u.quota as overquota
-> from user as u join recording as r on u.uid = r.uid
-> where u.summary > u.quota
-> order by r.recdate asc;
+-------+-------+---------------------+-----+-----------+
| recid | size | recdate | uid | overquota |
+-------+-------+---------------------+-----+-----------+
| 4 | 10000 | 2005-12-13 17:53:04 | 2 | 5000 |
| 5 | 10000 | 2005-12-14 17:53:04 | 2 | 5000 |
+-------+-------+---------------------+-----+-----------+
2 rows in set (0.00 sec)
I figured that if I had some kind of per-user running sum column for the size, I could easily restrict this list even further so that only items are returned until the running sum reaches the overquota value.
Because I was busy with other things for the customer, I threw the problem at Kai and Jan in the everpresent MySQL internal help channel, and they produced two vastly different, but interesting solutions for running sums. The solutions are also kind of characteristic for their occupation and work inside MySQL.
Kai is a trainer, teaching SQL. His approach to the running sum problem is by the book, and his solution is very SQLish and beautiful with a self join:
mysql> select r1.recdate
-> , r1.recid,
-> , r1.uid,
-> , sum(r2.size) as runningsum
-> from recording r1 join recording r2 using (uid)
-> where r1.recdate >= r2.recdate
-> group by r1.recid
-> order by r1.uid, r1.recdate;
+---------------------+-------+-----+------------+
| recdate | recid | uid | runningsum |
+---------------------+-------+-----+------------+
| 2005-12-08 17:52:20 | 1 | 1 | 10000 |
| 2005-12-11 17:52:20 | 2 | 1 | 30000 |
| 2005-12-15 17:52:20 | 3 | 1 | 40000 |
| 2005-12-13 17:53:04 | 4 | 2 | 10000 |
| 2005-12-14 17:53:04 | 5 | 2 | 20000 |
+---------------------+-------+-----+------------+
5 rows in set (0.00 sec)
To understand this, we have to unfold the query by leaving off the GROUP BY clause.
mysql> select *
-> from recording r1 join recording r2 using (uid)
-> where r1.recdate >= r2.recdate
-> order by r1.uid, r1.recdate;
+-----+-------+-------+---------------------+-------+-------+---------------------+
| uid | recid | size | recdate | recid | size | recdate |
+-----+-------+-------+---------------------+-------+-------+---------------------+
| 1 | 1 | 10000 | 2005-12-08 17:52:20 | 1 | 10000 | 2005-12-08 17:52:20 |
| 1 | 2 | 20000 | 2005-12-11 17:52:20 | 1 | 10000 | 2005-12-08 17:52:20 |
| 1 | 2 | 20000 | 2005-12-11 17:52:20 | 2 | 20000 | 2005-12-11 17:52:20 |
| 1 | 3 | 10000 | 2005-12-15 17:52:20 | 3 | 10000 | 2005-12-15 17:52:20 |
| 1 | 3 | 10000 | 2005-12-15 17:52:20 | 1 | 10000 | 2005-12-08 17:52:20 |
| 1 | 3 | 10000 | 2005-12-15 17:52:20 | 2 | 20000 | 2005-12-11 17:52:20 |
| 2 | 4 | 10000 | 2005-12-13 17:53:04 | 4 | 10000 | 2005-12-13 17:53:04 |
| 2 | 5 | 10000 | 2005-12-14 17:53:04 | 4 | 10000 | 2005-12-13 17:53:04 |
| 2 | 5 | 10000 | 2005-12-14 17:53:04 | 5 | 10000 | 2005-12-14 17:53:04 |
+-----+-------+-------+---------------------+-------+-------+---------------------+
9 rows in set (0.00 sec)
For each recording in r1, we list all recordings from the same user (the USING clause) with recording dates before (or equal) to the current recording (the WHERE clause). For recid 1, this is only the recording 1 itself. For recid 2, these are recordings 1 and 2, and for #3, recordings 1-3 are matched. The same happens for recording #4, for which only #4 itself is listed, and for recording #5, which is matched on the right hand side by numbers 4 and 5.
Using the GROUP BY clause, we fold the right hand side on itself so that rows with the same r1.recid are considered a group, and look only at the sum of the sizes from the right hand side. This produces our query result.
Is this fast?
Well, right now: no.
As the query is written right now, it walks the entire recoding table, because at the moment we don't restrict it with a set of UIDs which are over quota. Instead we create a running sum for all records. So the badness of this query is at least n for n rows in the recording table. In the explain we expect to see an ALL for the first table, and we have to try to limit the badness in the second table.
mysql> explain
-> select r1.recdate
-> , r1.recid,
-> , r1.uid,
-> , sum(r2.size) as runningsum
-> from recording r1 join recording r2 using (uid)
-> where r1.recdate >= r2.recdate
-> group by r1.recid
-> order by r1.uid, r1.recdate;
id: 1
select_type: SIMPLE
table: r1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
Extra: Using temporary; Using filesort
id: 1
select_type: SIMPLE
table: r2
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5
Extra: Using where
2 rows in set (0.01 sec)
Currently, nothing is there to help MySQL with the join. We can see that the optimizer tries - "using where" is present, but there are no possible_keys, and consequently no keys are used.
The join is on uid, so on index on that column would be a good first idea.
mysql> alter table recording add index ( uid );
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> explain
-> select r1.recdate
-> , r1.recid,
-> , r1.uid,
-> , sum(r2.size) as runningsum
-> from recording r1 join recording r2 using (uid)
-> where r1.recdate >= r2.recdate
-> group by r1.recid
-> order by r1.uid, r1.recdate;
id: 1
select_type: SIMPLE
table: r1
type: ALL
possible_keys: uid
key: NULL
key_len: NULL
ref: NULL
rows: 5
Extra: Using temporary; Using filesort
id: 1
select_type: SIMPLE
table: r2
type: ALL
possible_keys: uid
key: NULL
key_len: NULL
ref: NULL
rows: 4
Extra: Using where
2 rows in set (0.00 sec)
We can see that the optimizer considers that index, but does not use it. That's because the table is so small. If we run
mysql> insert into recording (
-> uid
-> , size
-> , recdate
-> ) select
-> uid
-> , size
-> , recdate
-> from recording;
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
a few times, things change:
mysql> select count(*) from recording;
+----------+
| count(*) |
+----------+
| 320 |
+----------+
1 row in set (0.00 sec)
mysql> analyze table recording;
+----------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| kris.recording | analyze | status | OK |
+----------------+---------+----------+----------+
1 row in set (0.00 sec)
mysql> explain
-> select r1.recdate
-> , r1.recid,
-> , r1.uid,
-> , sum(r2.size) as runningsum
-> from recording r1 join recording r2 using (uid)
-> where r1.recdate >= r2.recdate
-> group by r1.recid
-> order by r1.uid, r1.recdate;
id: 1
select_type: SIMPLE
table: r1
type: ALL
possible_keys: uid
key: NULL
key_len: NULL
ref: NULL
rows: 320
Extra: Using temporary; Using filesort
id: 1
select_type: SIMPLE
table: r2
type: ref
possible_keys: uid
key: uid
key_len: 8
ref: kris.r1.uid
rows: 160
Extra: Using where
2 rows in set (0.00 sec)
Now the index is being used, and the multiplier is 160. It is 160, because we duplicated the existing data so many times instead of using proper test data in the first place and we have only two possible user ids. Without the "analyze table", the multiplier listed is 11 instead of the correct value 160 - without table statistics on cardinality of values and value distribution, the optimizer makes assumptions, and in the case of our test setup, these assumptions are wrong.
Let us improve the quality of our test data a bit:
mysql> update recording set recdate = now() - interval rand()*30 day;
Query OK, 320 rows affected (0.00 sec)
Rows matched: 320 Changed: 320 Warnings: 0
mysql> update recording set recdate = now() - interval rand()*86400 second;
Query OK, 320 rows affected (0.00 sec)
Rows matched: 320 Changed: 320 Warnings: 0
mysql> update recording set uid = rand()*30+3 where recid > 5;
Query OK, 315 rows affected (0.00 sec)
Rows matched: 315 Changed: 315 Warnings: 0
mysql> analyze table recording;
+----------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| kris.recording | analyze | status | OK |
+----------------+---------+----------+----------+
1 row in set (0.00 sec)
mysql> insert into user (
-> uid
-> , summary
-> , quota
-> ) select uid
-> , sum(size) as summary
-> , 100000
-> from recording
-> where uid > 2
-> group by uid;
Query OK, 31 rows affected (0.00 sec)
Records: 31 Duplicates: 0 Warnings: 0
Now the real test data matches the original assumtions of the optimizer better, and is also closer to a real world scenario:
mysql> explain
-> select r1.recdate
-> , r1.recid,
-> , r1.uid,
-> , sum(r2.size) as runningsum
-> from recording r1 join recording r2 using (uid)
-> where r1.recdate >= r2.recdate
-> group by r1.recid
-> order by r1.uid, r1.recdate;
id: 1
select_type: SIMPLE
table: r1
type: ALL
possible_keys: uid
key: NULL
key_len: NULL
ref: NULL
rows: 320
Extra: Using temporary; Using filesort
id: 1
select_type: SIMPLE
table: r2
type: ref
possible_keys: uid
key: uid
key_len: 8
ref: kris.r1.uid
rows: 10
Extra: Using where
2 rows in set (0.00 sec)
Still we have an annoying multiplier of 10 in our query...
Jan is a developer, creating code within MySQL. His approach to the running sum problem is more procedural, a bit frightening and avoids the join by keeping state inside the select statement.
Here is his solution:
mysql> set @a := 0;
Query OK, 0 rows affected (0.00 sec)
mysql> select recdate
-> , recid
-> , uid
-> , runningsum
-> from (
-> select recdate
-> , recid
-> , uid
-> , @cur_uid := uid
-> , @a := if(@cur_uid <> ifnull(@last_uid, 0), size, ifnull(@a, 0) + size) as runningsum
-> , size
-> , @last_uid := uid
-> from recording
-> order by uid, recdate
-> ) as r;
+---------------------+-------+-----+------------+
| recdate | recid | uid | runningsum |
+---------------------+-------+-----+------------+
| 2005-12-08 17:52:20 | 1 | 1 | 10000 |
| 2005-12-11 17:52:20 | 2 | 1 | 30000 |
| 2005-12-15 17:52:20 | 3 | 1 | 40000 |
| 2005-12-13 17:53:04 | 4 | 2 | 10000 |
| 2005-12-14 17:53:04 | 5 | 2 | 20000 |
+---------------------+-------+-----+------------+
5 rows in set (0.00 sec)
Let us unfold the inner query to understand what is going on here:
mysql> set @a := 0;
Query OK, 0 rows affected (0.00 sec)
mysql> select @cur_uid := uid
-> , @a := if(@cur_uid <> ifnull(@last_uid, 0), size, ifnull(@a, 0) + size) as runningsum
-> , uid
-> , size
-> , @last_uid := uid
-> from recording
-> order by uid, recdate;
+-----------------+------------+-----+-------+------------------+
| @cur_uid := uid | runningsum | uid | size | @last_uid := uid |
+-----------------+------------+-----+-------+------------------+
| 1 | 10000 | 1 | 10000 | 1 |
| 1 | 20000 | 1 | 10000 | 1 |
| 1 | 40000 | 1 | 20000 | 1 |
| 2 | 10000 | 2 | 10000 | 2 |
| 2 | 20000 | 2 | 10000 | 2 |
+-----------------+------------+-----+-------+------------------+
5 rows in set (0.00 sec)
If this looks weird, it is because it is weird. We list the recording table completely and do so ordered by uid and recdate. Because we operate procedurally and with state, this is important - it is also evil, if you ask hardcore SQL pundits, but then remember: We are in here for speed, not beauty or portability.
It is important to understand that the expressions between SELECT and FROM are evaluated once for each row selected, and that they are evaluated from left to right - yes, even that is important, and therein lays additional evilness.
In our statement, we grab the current user id twice, once at the start of the statement as @cur_uid (we could actually be using uid directly instead), and once at the end of the statement as @last_uid. @last_uid is carried over to the next line (here is one part of the state we are operating on), and in our IF() clause we checkt for @cur_uid <> @last_uid. If that is the case, we have to reset the running sum counter @a to size (the other part of state we carry between the lines), otherwise we increment it by size.
We have to be careful with @last_uid and @a, because they might be NULL. If @last_uid is NULL, the comparison @cur_uid <> @last_uid would be neither 0 nor 1, but NULL regardless of the value of @cur_uid, because NULL<>anything is always NULL. To deal with this, we ifnull(@last_uid, 0). If @a is NULL, the sum would always be NULL, because NULL + anything is always NULL. To deal with this, we ifnull(@a, 0).
We now have the desired result, but also a lot of columns that are technical in nature and not part of the desired result. Using a subselect on an ad-hoc table, we drop them and only show what is requested - the original query.
The inner query is evil - that's why it is being wrapped in a subquery. If we want to join something against it, the number of executions of the stateful statements and their order must not change. The subquery wraps the breakable statements and protects them and their execution environment...