Skip to main content
  1. posts/

Mysql插入删除死锁问题排查

·2804 words·6 mins

查看业务日志

image.png

查看死锁日志

show engine innodb status; (查询语句)

------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-01-12 08:24:23 0x7f5a5cf75700
*** (1) TRANSACTION:
//事务A
TRANSACTION 9400396, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 10
MySQL thread id 2138935, OS thread handle 140026474653440, query id 58338774 10.21.17.247 liuchuangzhao update
insert into t_device_online (device_id, start_time, end_time,online_times) values

            (133, 1641975738, 1641975861,1)
         ,
            (137, 1641975727, 1641975861,1)
         ,
            (138, 1641975743, null,1)
         ,
            (144, 1641975726, null,1)
         ,
            (146, 1641975742, null,1)
         ,
            (147, 1641975728, 1641975862,1)
         ,
            (150, 1641975724, null,1)
         ,
            (152, 1641975746, 1641975861,1)
         ,
            (154, 1641975744, null,1)
         ,
            (28, 1641975737, 1641975861,1)
         ,
            (158, 1641975744, null,1)
         ,
            (32, 1641975736, 1641975862,1)
         ,
            (162, 1641975733, null,1)
         ,
            (36, 1641975736, 1641975861,1)
         ,
            (168, 1641975729, null,1)
         ,
            (40, 1641975732, n
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 61 page no 30559 n bits 824 index device_id of table `ifp_remote_platform`.`t_device_online` trx id 9400396 lock_mode X locks gap before rec insert intention waiting
*** (2) TRANSACTION:
//事务B
TRANSACTION 9400297, ACTIVE 1 sec fetching rows
mysql tables in use 1, locked 1
72 lock struct(s), heap size 24784, 1748 row lock(s), undo log entries 5
MySQL thread id 2138649, OS thread handle 140026083497728, query id 58338328 10.21.17.247 liuchuangzhao updating
delete from t_device_online where end_time is null
            and device_id in
             (
                133
             ,
                137
             ,
                138
             ,
                144
             ,
                146
             ,
                147
             ,
                150
             ,
                152
             ,
                154
             ,
                28
             ,
                158
             ,
                32
             ,
                162
             ,
                36
             ,
                168
             ,
                40
             ,
                10536
             ,
                169
             ,
                170
             ,
                42
             ,
                10411
             ,
                171
             ,
                172

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 61 page no 30559 n bits 824 index device_id of table `ifp_remote_platform`.`t_device_online` trx id 9400297 lock_mode X locks gap before rec
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 61 page no 30545 n bits 672 index device_id of table `ifp_remote_platform`.`t_device_online` trx id 9400297 lock_mode X waiting
*** WE ROLL BACK TRANSACTION (1)
事务A(insert)事务B(delete)
持有行锁 824 index(X锁)
等行锁 824 index(X锁)
等行锁 672 index(X锁)
回滚

从死锁日志上,可以,事务A并没有持有事务B所需要的资源啊,但是从现象上来看,事务A应该是持有了672的行锁。那我们就必须先了解insert的加锁过程。

insert加锁

image.png

insert加锁过程

  1. 先加插入意向Gap锁(insert intention gap lock)【如果别的事务已经有了这个间隙的锁Gap Lock,就无法加insert intention gap lock】
  2. 然后对插入的记录加索引记录锁(index-record lock)不会加gap锁,不影响其他insert的执行,除非他们插入的记录的索引值相同。
  3. 如果插入的记录索引值相同,则会出现duplicate-key error,就会对改索引加一个共享锁(shared lock)。但是如果有多个请求同时插入同一个索引值,这种情况可能会出现死锁。

举个例子:

事务A事务B事务C
START TRANSACTION;
INSERT INTO t1 VALUES(1);START TRANSACTION;START TRANSACTION;
获取到index-record lock(也是这个行记录的排查锁)INSERT INTO t1 VALUES(1);INSERT INTO t1 VALUES(1);
增加shared lock遇到duplicate-key error遇到duplicate-key error
等待shared lock等待shared lock
ROLLBACK;
释放shared lock获取到shared lock获取到shared lock
释放排查锁等待排查锁等待排查锁
死锁死锁

事务B和C都获取到了shared lock,都在等待排他锁,但是排他锁和shared lock互斥,所以事务B和C都获取不到排他锁。(想要获取排他锁必须等对方释放shared lock,但是这是不可能的) 还有一种情况也会产生死锁

事务A事务B事务C
START TRANSACTION;
delete from t1 where id = 1;START TRANSACTION;START TRANSACTION;
获取到index-record lock(也是这个行记录的排查锁)INSERT INTO t1 VALUES(1);INSERT INTO t1 VALUES(1);
增加shared lock遇到duplicate-key error遇到duplicate-key error
等待shared lock等待shared lock
commit;
释放shared lock获取到shared lock获取到shared lock
释放排查锁等待排查锁等待排查锁
死锁死锁

结论

三个以上的并发插入,如果一个回滚了,可能会存在死锁(原因是出现duplicate-key error,其他事务都获取不到排他锁**)** 一个删除,两个并发插入,删除提交后也会造成死锁。

Delete加锁

image.png
delete加锁过程

  1. 设置一个next-key的排他锁在每个搜索的行记录。(意味着除了占住行锁,还会占住间隙锁)
  2. 对于使用唯一索引搜索的情况只会对搜索的行加索引记录锁

复现

复现方式1

image.png
image.png
执行步骤:

事务A事务B
开始事务
开始事务
插入133
删除137
插入137
提交
死锁

死锁日志

------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-01-19 12:38:16 0x7f59772d9700
*** (1) TRANSACTION:
//事务A
TRANSACTION 10649448, ACTIVE 136 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 2419529, OS thread handle 140026471950080, query id 66144333 10.21.17.247 liuchuangzhao update
/* ApplicationName=DBeaver 21.0.0 - SQLEditor <dev_with_high_pri.sql> */ insert into t_device_online (device_id, start_time, end_time,online_times) values(137, 1641975727, 1641975861,1)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 61 page no 30554 n bits 824 index device_id of table `ifp_remote_platform`.`t_device_online` trx id 10649448 lock_mode X locks gap before rec insert intention waiting
*** (2) TRANSACTION:
//事务B
TRANSACTION 10649665, ACTIVE 28 sec fetching rows
mysql tables in use 1, locked 1
35 lock struct(s), heap size 8400, 203 row lock(s), undo log entries 100
MySQL thread id 2419581, OS thread handle 140022228293376, query id 66144549 10.21.17.247 liuchuangzhao updating
/* ApplicationName=DBeaver 21.0.0 - SQLEditor <Script-4.sql> */ delete from t_device_online where device_id in (133)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 61 page no 30554 n bits 824 index device_id of table `ifp_remote_platform`.`t_device_online` trx id 10649665 lock_mode X locks gap before rec
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 61 page no 30545 n bits 712 index device_id of table `ifp_remote_platform`.`t_device_online` trx id 10649665 lock_mode X waiting
*** WE ROLL BACK TRANSACTION (1)

复现方式2

image.png
image.png
执行过程

事务A事务B
开始事务
开始事务
插入137
删除137,133
插入133
死锁

死锁日志

------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-01-19 12:58:59 0x7f5a74376700
*** (1) TRANSACTION:
//事务A
TRANSACTION 10652218, ACTIVE 4 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 1
MySQL thread id 2420044, OS thread handle 140022237755136, query id 66160375 10.21.17.247 liuchuangzhao updating
/* ApplicationName=DBeaver 21.0.0 - SQLEditor <Script-4.sql> */ delete from t_device_online where device_id in (137,133)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 61 page no 30554 n bits 824 index device_id of table `ifp_remote_platform`.`t_device_online` trx id 10652218 lock_mode X waiting
*** (2) TRANSACTION:
//事务B
TRANSACTION 10652205, ACTIVE 10 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 2420043, OS thread handle 140026473572096, query id 66160420 10.21.17.247 liuchuangzhao update
/* ApplicationName=DBeaver 21.0.0 - SQLEditor <dev_with_high_pri.sql> */ insert into t_device_online (device_id, start_time, end_time,online_times) values
(133, 1641975727, 1641975861,1)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 61 page no 30554 n bits 824 index device_id of table `ifp_remote_platform`.`t_device_online` trx id 10652205 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 61 page no 30545 n bits 712 index device_id of table `ifp_remote_platform`.`t_device_online` trx id 10652205 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (2)

死锁原因分析

虽然复现方式不一样,但是死锁日志是一样的。

复现1原因分析:

事务A事务B
开始事务
开始事务
插入133(插入index-recode-lock,意向排他锁,是一个隐式排他锁)
删除137(获取了137的行锁及周围间隙锁)
删除133
(发现133有隐式排他锁,就帮他加上记录锁。
等待133的排他锁,也就是133的行锁)
插入137(等待137周围的间隙锁)
死锁

复现2原因分析:

事务A事务B
开始事务
开始事务
插入137(插入index-recode-lock,意向排他锁,是一个隐式排他锁)
删除137,133 (获取了133的行锁,等待137的行锁)【这里可以说明删除会对in重排序,不然不会造成死锁】
插入133(等待133的行锁)
死锁

业务死锁分析

  1. insert执行流程长,拿到了一些device_id
  2. insert执行过程中,delete 对 in 里面的device_id进行排序,然后先于insert拿到后面要执行的device_id,但是需要等待insert已经获取的device_id。
  3. insert要等delete已获取的device_id
  4. 就死锁了

待验证 为什么delete会对in里面的device_id重排序? 我觉得是因为删除语句没有走索引,要全表扫描,mysql为了避免随机IO,就对in的device_id排序了。 为什么insert 不会? insert默认是通过主键id去查找,然后插入都是很快的,所以不需要重排序。 ​

复现1和复现2和最开始的死锁日志一致。 如果复现1还不能说明问题的话,复现2就很能说明问题了。只要insert的够慢,delete 先拿到insert接下来要删除的行锁,就会死锁。 ​

参考

https://blog.csdn.net/varyall/article/details/80219459 (insert加锁分析) https://cloud.tencent.com/developer/article/1900240 (insert加锁分析) https://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html (mysql官网innodb加锁说明) https://www.docs4dev.com/docs/zh/mysql/5.7/reference/innodb-locks-set.html(中文文档)