Mysql学习笔记:eq_range_index_dive_limit参数

在学习mysql统计信息和执行计划的过程中,发现了一个挺有意思的问题。就是mysql优化器在没有创建直方图的情况下,在评估rows上能做到非常精准。

oracle CBO在评估rows时,如果没有非常准的频率直方图,是无法做的精准的rows评估的。因为oracle在对象存在统计信息的情况下,所有cost、rows评估都是完全依赖统计信息的。其中rows的评估依赖dba_tables.num_rows、dba_tab_columns.num_distinct、num_nulls、density、low_value、high_value等等。

看一个oracle简单的例子

可以看到c=’C’或’D’都与实际值相差较多。实际上对于rows评估,oracle是完全依赖于统计信息的。

  • c=’C’落在LOW_VALUE和HIGH_VALUE之间,rows评估公式如下:

所以rows=1050/3=350

  • c=’D’在LOW_VALUE和HIGH_VALUE之外,rows评估公式如下:

所以rows=1050/2/3=175

可以看出oracle在字段数据倾斜很严重的时候,rows评估会造成比较大的偏差,所以才会有直方图的出现。即使有直方图,oracle也无法非常精准的评估出rows,因为CBO是完全依赖统计信息的。

下面来看看mysql的测试:

在没有直方图的情况下,可以看到rows的评估非常精准。

从统计信息中也无法做出如此精准的判断,这是什么原因呢?

原因就在于mysql在没有直方图之前,如果数据倾斜严重的情况下,容易走错执行计划。所以引入了一个index dive的特性。官方文档介绍如下:

Equality Range Optimization of Many-Valued Comparisons

Consider these expressions, where col_name is an indexed column:

  • col_name IN(val1, …, valN)
  • col_name = val1 OR … OR col_name = valN

Each expression is true if col_name is equal to any of several values. These comparisons are equality range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:

  • If there is a unique index on col_name, the row estimate for each range is 1 because at most one row can have the given value.
  • Otherwise, any index on col_name is nonunique and the optimizer can estimate the row count for each range using dives into the index or index statistics.

With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression col_name IN (10, 20, 30) has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.

Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.

The eq_range_index_dive_limit system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To permit use of index dives for comparisons of up to N equality ranges, set eq_range_index_dive_limit to N + 1. To disable use of statistics and always use index dives regardless of N, set eq_range_index_dive_limit to 0.

To update table index statistics for best estimates, use ANALYZE TABLE.

Prior to MySQL 8.0, there is no way of skipping the use of index dives to estimate index usefulness, except by using the eq_range_index_dive_limit system variable. In MySQL 8.0, index dive skipping is possible for queries that satisfy all these conditions:

  • The query is for a single table, not a join on multiple tables.
  • A single-index FORCE INDEX index hint is present. The idea is that if index use is forced, there is nothing to be gained from the additional overhead of performing dives into the index.
  • The index is nonunique and not a FULLTEXT index.
  • No subquery is present.
  • No DISTINCTGROUP BY, or ORDER BY clause is present.

For EXPLAIN FOR CONNECTION, the output changes as follows if index dives are skipped:

  • For traditional output, the rows and filtered values are NULL.
  • For JSON output, rows_examined_per_scan and rows_produced_per_join do not appear, skip_index_dive_due_to_force is true, and cost calculations are not accurate.

Without FOR CONNECTIONEXPLAIN output does not change when index dives are skipped.

After execution of a query for which index dives are skipped, the corresponding row in the Information Schema OPTIMIZER_TRACE table contains an index_dives_for_range_access value of skipped_due_to_force_index.

该特性的意思是mysql在评估满足index dive生效条件的执行计划时,会实际去访问一些page,从而获取更加精准的rows。以便得到正确的执行计划。并不是完全依靠统计信息。

index dive的参数由eq_range_index_dive_limit控制。

  • 0 :始终使用index dive的方式。
  • 1 :不使用index dive的方式。
  • N :1+N 条件个数。

index dive的适用条件:

  • 非唯一索引字段的in或者or,col_name IN (val1, ..., valN)/col_name = val1 OR … OR col_name = valN
  • 非唯一索引字段的(< > <= >=)的范围扫描
  • 非唯一索引字段的等值查询col_name=val

index dive的限制:

在 MySQL 8.0 之前,只能使用eq_range_index_dive_limit跳过index dive。在 MySQL 8.0 中,满足这些条件的查询可能会跳过index dive:

  • 多表关联
  • 使用了FORCE INDEX HINT
  • 存在子查询
  • 存在DISTINCT, GROUP BY, or ORDER BY
  • 索引为全文索引

通过optimizer_trace验证是否使用了index dive:

后续小菜鸟还初探了一下源码,可以看到每次index dive采样的page最多10个page,以及rows评估的公式。

  • N_PAGES_READ_LIMIT 10
  • n_rows = n_rows_on_prev_level * n_rows / n_pages_read

 

 

 

 

此条目发表在Mysql分类目录,贴了, 标签。将固定链接加入收藏夹。

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注