Sunday, February 20, 2011

Very slow query with high number of Rows_examined

We had some interesting problem on one production MYSQL server today.
One developer reported there's a delete statement taking very long time to run on small tables. 
The three tables involved this query, category_group, tmp_category_group category_group_info have 7400, 7600 and 25 rows respectively.
While checking slow query logs, shockingly I found the query has 3.8MM examined rows. Change the query to select statement will yield the same result.


# Query_time: 9  Lock_time: 0  Rows_sent: 0  Rows_examined: 3809728
delete from category_group where group_id = (select group_id from category_group_info x where x.group_name = 'Storylines') and category_id in ( select category_id from tmp_category_group cg where group_id = (select group_id from category_group_info x where x.group_name = 'KidsCharacter'));


# Query_time: 8  Lock_time: 0  Rows_sent: 0  Rows_examined: 3810182
select * from category_group where group_id = (select group_id from category_group_info x
 where x.group_name = 'Storylines') and category_id in
 ( select category_id from tmp_category_group cg where group_id =
 (select group_id from category_group_info x where x.group_name = 'KidsCharacter'));


The explain plan doesn't provide much help.


mysql> explain select * from category_group where group_id = (select group_id from category_group_info x where x.group_name = 'Storylines') and category_id in ( select category_id from tmp_category_group cg where group_id = (select group_id from category_group_info x where x.group_name = 'KidsCharacter'));
+----+--------------------+----------------+------+---------------+----------+---------+-------+------+--------------------------+
| id | select_type        | table          | type | possible_keys | key      | key_len | ref   | rows | Extra                    |
+----+--------------------+----------------+------+---------------+----------+---------+-------+------+--------------------------+
|  1 | PRIMARY            | category_group | ref  | group_id      | group_id | 4       | const |  501 | Using where; Using index | 
|  3 | DEPENDENT SUBQUERY | cg             | ALL  | NULL          | NULL     | NULL    | NULL  | 7396 | Using where              | 
|  4 | SUBQUERY           | x              | ALL  | NULL          | NULL     | NULL    | NULL  |   25 | Using where              | 
|  2 | SUBQUERY           | x              | ALL  | NULL          | NULL     | NULL    | NULL  |   25 | Using where              | 
+----+--------------------+----------------+------+---------------+----------+---------+-------+------+--------------------------+
4 rows in set (0.01 sec)

After extensive testing, I figure out the query will run fast if we take off where clause of this subquery.

select category_id from tmp_category_group cg where group_id = ...

That means the query run much faster on whole table of tmp_category_group than having a predicate. That doesn't make sense right?
That end up leading me to add an index on group_id of tmp_category_group and the query running lightening fast now. 


Could this be some sort of Bug?

mysql> select * from category_group where group_id = (select distinct group_id from category_group_info x where x.group_name = 'Storylines') and category_id in ( select distinct category_id from tmp_category_group cg where group_id = (select distinct group_id from category_group_info x where x.group_name = 'KidsCharacter'));
Empty set (0.31 sec)

mysql> delete from category_group
-> where group_id = (select group_id from category_group_info x where x.group_name = 'Storylines')
-> and category_id in (
-> select category_id
-> from tmp_category_group cg
-> where group_id = (select group_id from category_group_info x where x.group_name = 'KidsCharacter'));
Query OK, 0 rows affected (0.36 sec) 

Thursday, February 10, 2011

How Seconds_behind_master is computed

Found a very good explanation of how Seconds_behind_master is computed in this bug report.

http://bugs.mysql.com/bug.php?id=2826


Here's how Seconds_behind_master is computed.

* When the slave connects to the master (when the slave I/O thread connects to the
master),
it does a = SELECT UNIX_TIMESTAMP() on master,
and b= SELECT UNIX_TIMESTAMP() on slave, and computes c=b-a.

* Every statement executed on the master is stored on the master's binary log together
with the timestamp when it was executed.

* So when the slave SQL thread reads a statement from the binary log, it executes it and
then it does:

seconds_behind_master= SELECT UNIX_TIMESTAMP() on slave
- timestamp of the master's binary log event
- c