MySQL性能优化_牛客博客

一、存储数据结构

1、B+树

InnoDB引擎使用B+Tree实现。

B+Tree特性：

1）所有关键字都出现在叶子结点的链表中，且链表中的关键字恰好是有序的；

2）不可能在非叶子结点命中；

3）非叶子结点相当于是叶子结点的索引，叶子结点相当于是存储（Key）数据的数据层；

2、主键索引

主键索引，是聚集索引。叶子节点Value部分存储完整数据。

3、非主键索引

非主键索引，是非聚集(稀疏)索引。叶子节点Value部分存储主键。

4、复合(联合)索引

Key按顺序存储所有的复合键。

二、常用数据类型

1、CHAR

定长，选择策略：

1）固定长度或者平均长度与最大长度相差很小；

2）字段频繁更新，不易产生碎片；

CHAR(n)

n 表示存储的字符数

可使用sql查询数据字节数

SELECT id, LENGTH(<column>) from <table> order by id asc;

2、VARCHAR

不定长，选择策略：

1）平均长度与最大长度相差很大；

2）字段更新少；

VARCHAR(n)

n 表示存储的最大字符数；

3、INT

4字节，带符号最大值21亿，可用于id列；
INT(n)

n表示宽度，创建列时用zerofill修饰，查询时补0；

4、TINYINT

1字节，可用于枚举；
TINYINT(n)

n表示宽度，创建列时用zerofill修饰，查询时补0；

5、DATETIME

常用时间类型。
DATETIME(n）

n表示精度，n=3时，表示时间精确到毫秒；

三、查询计划

explain select <columns> from <table>

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	employees	range	idx_name_age_position	idx_name_age_position	78	1	100.00	Using index condition

1、id列

若存在子查询或者复杂查询，存在多行select；id大的优先级高，相同id的，从上到下执行；id为NULL优先级最低。

2、select_type列

1) simple

简单查询，不包含union和子查询。

2) primary

复杂查询中，最外层select。

3) subquery

select 中的子查询。

4) derived

from 中的子查询。

5）union

union后的select。

3、table列

表名、衍生表derivenN、unionN

4、type列

性能排序一般为：

NULL > system > const > eq_ref > ref > range > index > ALL

type 在range(包含)之前是可接受的，但还需要根据具体场景优化。

1）NULL

查询语句已被查询分析器优化。

2）system、const

主键或唯一键同常量比较

3）eq_ref

主键或唯一键被用于联接查询。

4）ref

非唯一键、非主键索引。

5）range

范围查询。

6）index

全索引扫描，需要优化。

7）ALL

全表扫描，需要优化。

5、possible_keys列

可能会使用的索引，不一定用。

6、key列

使用的索引。

7、key_len列

索引键长度=各索引键长度（字节数）之和。

CHAR(n)、VARCHAR(n):

表字符集 gbk: 2n字节、utf8: 3n字节、utf8mb4: 4n字节

变长：2字节

可空：1字节

例：表字符集utf8mb4，VARCHAR(10) 可空列的长度=4*10+2+1=43。

8、ref列

索引中列引用的列名或者常量。

9、rows列

预估扫描的数据行数。

10、Extra列

额外信息。

1）Using index

通常表示使用覆盖索引。

2）Using where

通常表示没有用索引（type为ALL）。

3）Using index condition

使用联合索引中的部分（前导列），需要回表查询。

4）Using temporary

使用临时表。

5）Using filesort

没有用索引排序，可能是内存排序或者磁盘文件排序。

查看Trace中“sort_mode”字段。

a、单路排序

"sort_mode"为<sort_key,additional_fields>或者<sort_key,packed_addtional_fields>

通过where条件查找到主键id->取出相关列(select、排序、主键等)，放入sort_buffer->对sort_buffer中数据排序->return。

b、双路(回表)排序

"sort_mode"为<sort_key,rowid>

通过where条件查找到主键id->取出排序列和主键id，放入sort_buffer->对sort_buffer中数据排序->遍历id，回原表取select列->return。

通过比较查询字段总长度与max_length_for_sort_data(默认1024字节)参数；前者小使用单路排序、前者大使用双路排序。一般单路排序性能更优，但要平衡内存占用与性能。

6）Select tables optimized away

已被查询分析器优化。

explain select max(id) from <table>;

四、Trace工具

用于分析SQL内部优化和执行过程，比较耗性能，用完后需关闭。

#开启trace
set session optimizer_trace="enabled=on",end_markers_in_json=on;
#执行查询语句
select * from xxx where a > '1' order by b;
#获取trace信息
select * from information_schema.OPTIMIZER_TRACE;
#关闭trace
set session optimizer_trace="enabled=off";

主要信息：

"analyzing_range_alternatives": { #索引使用分析
    "range_scan_alternatives": [{
        "index": "idx_a_b_c",
        "ranges": [
            "1 <= a <= 1"
        ] /* ranges */ ,
        "index_dives_for_eq_ranges": true,
        "rowid_ordered": false, #使用主键排序
        "using_mrr": false,
        "index_only": true, #是覆盖索引
        "rows": 1, #扫描函数
        "cost": 1.21, #使用成本
        "chosen": true #是否选择该索引
    }] /* range_scan_alternatives */ ,

"considered_execution_plans": [{ #执行计划
    "plan_prefix": [] /* plan_prefix */ ,
    "table": "`left_prefix`",
    "best_access_path": {
        "considered_access_paths": [{ #可能的路径
                "access_type": "ref", #访问类型
                "index": "idx_a_b_c", #索引
                "rows": 1, #扫描行数
                "cost": 1.2, #使用成本，在当前trace中对比，数值小的性能高
                "chosen": true #是否选择该索引
            },
            {
                "access_type": "range",
                "range_details": {
                    "used_index": "idx_a_b_c"
                } /* range_details */ ,
                "chosen": false,
                "cause": "heuristic_index_cheaper"
            }
        ] /* considered_access_paths */
    } /* best_access_path */ ,

"join_execution": {
        "select#": 1,
        "steps": [
          {
            "filesort_information": [
              {
                "direction": "asc",
                "table": "`left_prefix`",
                "field": "c"
              }
            ] ,
            "filesort_summary": { #文件排序汇总
              "rows": 1, #预计扫描行
              "examined_rows": 1, #排序行
              "number_of_tmp_files": 0, #0表示内存排序，大于0表示磁盘文件排序
              "sort_buffer_size": 261888, #排序缓存区大小
              "sort_mode": "<sort_key, packed_additional_fields>"#单路排序
            } /* filesort_summary */

五、性能优化

1、索引

1.1 原则

1）业务上具有唯一性的字段，需要建立唯一索引，微小性能损失可以忽略。

2）建联合索引时，区分度高的字段放最左边。最左字段区分度至少要达到0.1，大于0.9最佳。

3）varchar列在建索引时必须指定索引列字符长度，一般长度不超过20。内存足够的情况下，区分度可尽量趋近1。

#查询列的最大字符长度
select char_length(<列名>) as length, left(<列名>, <索引列字符长度>) as index_partial, <列名> as column_text from <table> order by length desc limit 100;
#最左字段区分度
select count(distinct left(<列名>, <索引列字符长度>))/count(*) from <table>; 
#不重复值区分度，用于决定要给某列加索引时，确定索引列字符长度
select count(distinct left(<列名>, <索引列字符长度>))/count(distinct <列名>) from <table>;

1.2 脚本

#查询表索引
show index from <table>;
#创建普通索引
create index idx_a_b_c on <table> (a,b,c);
#普通索引，带上索引长度
create index idx_a_b_c on <table> (a(<length>),b(<length>),c(<length>));
#创建唯一索引
create unique index uk_a on <table> (a);
#删除索引
drop index <index_name> on <table>;
#8.0 修改索引可见性，为了观察不使用索引对查询性能的影响
alter table <table_name> alter index <index_name> [visible | invisible]

2、数据准备

#创建表
CREATE TABLE `employees` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(24) NOT NULL DEFAULT '' COMMENT '姓名',
  `age` int(11) NOT NULL DEFAULT '0' COMMENT '年龄',
  `position` varchar(20) NOT NULL DEFAULT '' COMMENT '职位',
  `hire_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '入职时间',
  PRIMARY KEY (`id`),
  KEY `idx_name_age_position` (`name`,`age`,`position`) USING BTREE,
  KEY `idx_hireTime` (`hire_time`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8 COMMENT='员工记录表';

CREATE TABLE `left_prefix` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(20) COLLATE utf8mb4_bin NOT NULL,
  `b` varchar(20) COLLATE utf8mb4_bin NOT NULL,
  `c` varchar(20) COLLATE utf8mb4_bin NOT NULL,
  `d` varchar(20) COLLATE utf8mb4_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_a_b_c` (`a`,`b`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

CREATE TABLE `left_prefix2` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(20) COLLATE utf8mb4_bin NOT NULL,
  `b` varchar(20) COLLATE utf8mb4_bin NOT NULL,
  `c` varchar(20) COLLATE utf8mb4_bin NOT NULL,
  `d` varchar(20) COLLATE utf8mb4_bin NULL,
  PRIMARY KEY (`id`),
  KEY `idx_a_b_c` (`a`,`b`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

#给表left_prefix2插入数据 
DELIMITER ;;
CREATE PROCEDURE prod_insert_left_prefix2_data()
BEGIN
DECLARE n INT DEFAULT 0;
WHILE (n <= 5000000) DO
	SET n = n + 1;
	INSERT INTO left_prefix2(a,b,c,d) values('1','2',n,'');
END WHILE;
END;
CALL prod_insert_left_prefix2_data();

3、优化建议

3.1 id列使用整形(int)自增，降低索引node大小。

3.2 NULL判断需要表扫描, 字段应设置NOT NULL加默认值,查询条件转换成 where d = '' 形式。

explain select * from left_prefix2 where a = '1' and d is null;
explain select * from left_prefix2 where a = '1' and d is not null;

3.3 索引满足最左前缀法则。

查询条件匹配索引前N个列，不跳过中间列，范围查询列或者排序(分组)条件列放最后。

执行计划：

#eq
explain select * from left_prefix where a = '1' and b = '2' and c = '3'; #使用a、b、c
explain select * from left_prefix where a = '1' and c = '3' and b = '2'; #使用a、b、c
explain select * from left_prefix where a = '1' and b = '2'; #使用a、b
explain select * from left_prefix where a = '1' ; #使用a
explain select * from left_prefix where a = '1' and c = '3'; #使用a

explain select * from left_prefix where b = '2' and c = '3';#索引扫描，性能差
explain select * from left_prefix where c = '3';#索引扫描，性能差

#like前缀匹配时，近似看成 '=' ,但 'type' = 'range'
explain select  * from left_prefix where a = '3' and b like 'xxx%' and c = '3'; #使用a、b、c
explain select  * from left_prefix where a = '3' and b like 'xxx%xx%' and c = '3'; #使用a、b、c
explain select  * from left_prefix where a = '3' and b like '%xx%' and c = '3'; #使用a,建议使用搜索引擎
explain select  * from left_prefix where a = '3' and b like '%xx' and c = '3'; #使用a,建议使用搜索引擎

#范围查询
explain select * from left_prefix where a = '1' and b = '2' and c > '3'; #使用a、b、c
explain select * from left_prefix where a = '1' and b > '2'; #使用a、b
explain select * from left_prefix where a = '1' and b in ('2'); #使用a、b
explain select * from left_prefix where a = '1' and b in ('2','3'); #非整形主键索引,一般不走索引,全表扫描
explain select a,b,c from left_prefix where a = '1' and b in ('2','3'); #使用覆盖索引优化IN

explain select * from left_prefix where a = '1' and b > '2' and c = '3'; #使用a、b
explain select a,b,c from left_prefix where a > '1'; #使用a
explain select * from left_prefix where a > '1'; #不使用索引，全表扫描更优，可参考trace
explain select * from left_prefix2 where a > '1'; #使用a,数据量大，使用索引更优
explain select a,b,c from left_prefix where a > '1' and b = '2' and c = '3'; #使用a

explain select * from left_prefix where id > 3; #回表
explain select id from left_prefix where id > 3; #覆盖索引，性能高
#能用in的不用or
explain select * from left_prefix where id in(2,3,4); #整形主键索引,并且查询数据量小的情况,一般会走索引
explain select * from left_prefix where id in(2,3,4,5,6); #整形主键索引，并且查询数据量大的情况，一般不走索引
explain select id from left_prefix where id in(2,3,4,5,6); #整形主键索引，并且查询数据量大的情况，一般不走索引, 可用覆盖索引优化

#使用in代替or。无索引时in比or性能更优。有索引时差不多。
explain select * from left_prefix2 where c in('3333332','3333334','3333336'); #无索引
explain select * from left_prefix2 where c = '3333332' or c = '3333334' or c ='3333336'; 

#排序
explain select * from left_prefix where a = '1' and b = '2' order by c ; #使用a、b, 回表
explain select a,b,c from left_prefix where a = '1' and b = '2' order by c ; #使用a、b, 覆盖索引
explain select a,b,c from left_prefix where a = '1' and b = '2' order by b,c,a ; #使用a、b, using filesort 
explain select * from employees where name='a' and age = 1 order by position; #中间字段为int时, 查询优化成order by position
explain select * from employees where name='a' and age = 1 order by position,age; #中间字段为int时, 查询优化成order by position
explain select * from employees where name='a' and age = 1 order by age,position,name; #中间字段为int时, 查询优化成order by position, 可查看trace
explain select * from left_prefix order by a ; #全表扫描, using filesort 
explain select a,b,c from left_prefix order by a ; #使用覆盖索引优化 
explain select * from left_prefix where a = '1' order by b ; #使用a
explain select * from left_prefix where a = '1' order by c ; #使用a, using filesort 单路排序
explain select * from left_prefix where a = '1' order by b asc,c asc; #使用a
explain select * from left_prefix where a = '1' order by b desc,c desc; #使用a
explain select * from left_prefix where a = '1' order by c asc,b asc; #使用a,using filesort 与索引键排序不一致
explain select * from left_prefix where a = '1' order by b desc,c asc; #使用a,using filesort 与索引键排序不一致,MySQL 8.0可创建降序索引
explain select * from left_prefix where a = '1' order by b asc,c desc; #使用a,using filesort 与索引键排序不一致

3.4 尽量使用覆盖索引，能查询具体列的不使用'select *'。

3.5 不等于(not in 或者 <>) 需要表扫描，应转换成等于形式。

explain select * from left_prefix2 where a <> 'x';
explain select * from left_prefix2 where a not in ('xx','xxxx');

3.6 使用in代替or，无索引时in比or性能更优。

explain select * from left_prefix2 where id in(3333332,3333334,3333336); #有索引时差不多
explain select * from left_prefix2 where id = 3333332 or id = 3333334 or id =3333336; 

explain select * from left_prefix2 where c in('3333332','3333334','3333336');#无索引时in更优
explain select * from left_prefix2 where c = '3333332' or c = '3333334' or c ='3333336';

3.7 索引列使用函数、类型转换、计算操作不走索引。

explain select * from left_prefix2 where left(a,1) = '1';
explain select * from left_prefix2 where date(a) = '2020-12-12';
explain select * from left_prefix2 where a = 1;
explain select * from left_prefix2 where a = '1'+'2';

3.8 分页查询优化

select * from left_prefix2 where a > '0' limit 1000,15;#跳过较少，扫描数少
select * from left_prefix2 where a > '0' limit 2000000,15;#跳过较多，扫描数多
#排序后分页查询，遍历后还要回表查询，走全表扫描，Using filesort
explain select * from left_prefix2 where a > '0' order by a limit 2000000,15;

#优化方式一，1、取出满足条件的id，走覆盖索引；2、根据id查询数据；
explain select id from left_prefix2 where a > '0' order by a limit 2000000,15;
explain select * from left_prefix2 l inner join (select id from left_prefix2 where a > '0' order by a limit 2000000,15) tmp on l.id = tmp.id;

#优化方式二，2、通过索引范围查询达到分页目的 
explain select * from employees where hire_time > '2020-12-12' order by hire_time desc limit 15;

3.9 表连接优化

1) Nested-Loop Join (NLJ)嵌套循环连接算法

被驱动表关联字段有索引。一般小表(参与连接的行少)作为驱动表，大表作为被驱动表。驱动表全表扫描，循环读取一行，使用联接字段查询被驱动表，取出结果合并，返回给客户端。因为被驱动表走索引，性能较高。

explain select * from left_prefix t1 join left_prefix2 t2 on t1.a = t2.a;
explain select * from left_prefix2 t2 join left_prefix t1 on t1.a = t2.a;#替换表顺序不影响查询效率

2) Block Nested-Loop Join(BNL)基于块的嵌套循环连接算法

被驱动表关联字段无索引。一般小表作为驱动表，大表作为被驱动表。驱动表数据加载到join_buffer中，循环读取一行被驱动表数据，与join_buffer中数据比对，取出结果合并，返回给客户端。相当于在内存中做笛卡尔积，性能较低。

explain select * from left_prefix t1 join left_prefix2 t2 on t1.b = t2.b;#Using join buffer (Block Nested Loop)
explain select * from left_prefix2 t2 join left_prefix t1 on t1.b = t2.b;#替换表顺序不影响查询效率

优化方式：

1）给被驱动表关联字段加索引。

2）小表连接不超过3张，数据量大的表尽量不做连接。

*）关联字段类型需一致，否则不走索引。

3.10 count查询优化

查询分析器对count查询会做优化，优先走非聚集索引，因为叶子节点不存储数据，数据量小，扫描效率较高。

UPDATE `left_prefix2` SET `d` = null WHERE (id = 1 or id = 2) ;
create index idx_d on left_prefix2 (d);
explain select count(1) from left_prefix2; #走非主键索引idx_d 5000005
explain select count(id) from left_prefix2; #走非主键索引idx_d 5000005 
explain select count(a) from left_prefix2; #走非主键索引idx_a_b_c 5000005 ,索引键不可空，总数正确
explain select count(d) from left_prefix2; #走非主键索引idx_d 5000003,索引键可空，不计算NULL值
explain select count(*) from left_prefix2; #走非主键索引idx_d 5000005，内部优化，推荐使用

优化方式：

1）使用count(*)查询，兼顾性能和数据准确性。

2）增加计数表，数据增删时，在事务中同步计数。

五、运维优化

脚本：

#查询连接
show full processlist;

#查询事务
select * from information_schema.INNODB_TRX;
#杀死死锁事务
kill [trx_mysql_thread_id]