================================================================== MySQL Internals ~ ================================================================== Storage Engine Interface ~ Build, Run, Debug ~ Optimizer ~ ================================================================== Storage Engine Interface ~ ================================================================== Abstracted Storage Engine Interface (Table Handler): This module is abstract class -- handler and a structure handlerton. (?) - handlerton added in 5.1 for plugin integration -- provides standard interface. - defined in sql/handler.h and sql/handler.cc (partial implementation) - Storage Engines: MyISAM, InnoDB, Memory, BerkeleyDB ================================================================== Build, Run, Debug ~ ================================================================== - Steps to build: copy ./BUILD/compile-pentium-debug to compile-generic-debug (if needed) Remove $pentium_cflags in the script and run it: ./BUILD/compile-generic-debug --prefix=/home/thava/mysql/install make install; will install binaries in your directory. - To run only one test: Create ./mysql-test/t/example.test with your sql query in it. cd mysql-test ./mysql-test-run --record example - To run only test suite: Create ./mysql-test/suite/backup/t/example.test with your sql query ./mysql-test-run --suite=backup example - Best way to debug: cd mysql-test mysql-test-run --manual-gdb example gdb -cd /home/thava/mysql/dev/mysql-6.0/install/mysql-test -x /home/thava/mysql/dev/mysql-6.0/install/mysql-test/var/tmp/gdbinit.master_0 \ /home/thava/mysql/dev/mysql-6.0/install/libexec/mysqld (or mysql-test-run --gdb example ; for xterm with gdb) - After the example.test query completes, you can still debug using gdb-- you can directly connect to mysqld from another client on port 9306 : ./client/mysql -uroot -host=127.0.0.1 -port=9306 test - gdb commands: You can abbreviate all commands: b - breakpoints; info breakpoints ; enable ; disable where ; backtrace; bt; - print stack trace; c - continue; next; step; list info local ; info about local variables; info threads; info line [main | mi_open.c:83] up ; down; up - help ("help condition"); run (=r); break (=b); continue (=c); condition; next (=n); step (=s); finish; delete break; thread; print (=p); whatis; ptype; display; watch; kill (=k); - Ways to debug: --debug option: cd mysql-test; mtr --debug (same as --debug=d:t:i:o,/tmp/mysqld.trace ) mysqld --debug=d,info,error,query,general,where:O,/tmp/mysqld.trace Recommended: mysqld --debug=d:F:L:O,/tmp/thava-trace.txt Empty list for d accepts all tags. mysql> set debug="d:O,/tmp/thava-trace.txt" ; #Note: "d" specifies to debug all Flag Description d d,keyword1,keyword2 : outputs DBUG_PRINT(keyword, arglist) only for matching keywords! D -#D,20 specifies a delay of two seconds after each debug print f -f,main,subr1 enables debug only for main, subr1 functions. F print the source file name L Identify the source file line number i print PID g Enable profiling output: dbugmon.out ; use: g,main,myfunc to limit to functions. n Print the current function nesting depth N Number each line of debug output. o o,/tmp/mytrace.out redirects the output. default: /tmp/mysqld.trace O Flush trace file for each write. p Limit to the processe. Synopsis: --debug=p,mysqld ; DBUG_PROCESS("mysqld") P Print the current process name for each line of debug or trace output. r After DBUG_PUSH(), start from fresh nesting level. S Do function _sanity(_file_,_line_) at each debugged function until _sanity() returns something that differs from 0. (Mostly used with safemalloc to find memory leaks) t Enable function call/exit trace lines. t,20 limits nesting level to 20. Example: -#d:t -#d:f,main,subr1:F:L:t,20 -#d,input,output,files:n -#d:t:i:O,\\mysqld.trace --debug=d:t:i:O,/tmp/thava-trace.txt common tags to print (with d option) : enter, exit, error, warning, info, and loop. Use: --debug=d,thava:F:L and keep using DBUG_PRINT("thava", (fmt, arg1,arg2)); DBUG_PUSH("d:t"); ==> sets the debug options state to this new state. Recommended tags: info,error,warning,query,general,where,thava:F:L:t:i:O,/tmp/thava-mysqld.trace By running following command in sql dir, we get following debug tags: grep DBUG_PRINT *.cc *.h | sed -e 's/.*DBUG_PRINT[ \t]*("\([^"]*\)".*/\1/' | sort | uniq admin alter_info arena best compute_next_execution_time connectstring counts debug delayed enter error exec_event exec_query execute exit general hidden INDEX VALUES info info: info-in-hash list_invariants load_from_row lock loop master_info my myrg MYSQL_TYPE_BIT NDB_SHARE ndb_util_thread note plugin prep_query proc_info qcache query quit read_event rename_share ret_info return signal skip_event sleep table tcache test thd->options tmptable trans uuid value wait warning - Valgrind use: To debug: myprog arg1 arg2 run: valgrind myprog arg1 arg2 run: valgrind --leak-check=yes myprog arg1 arg2 ; For detailed memleak check. It contains memcheck, cachegrind, callgrind, etc. Default is memcheck. - Be careful to use appropriate mutex/lock before accessing global variables. - Use of sql_alloc() preferred over my_malloc(); sql_alloc() allocates from predefined pool for the current query. At the end of query it will be automatically freed. For large allocation, use my_malloc() and free it using my_free() -- they internally use malloc() and free() -- allocations persist across queries. - Don't use exceptions, STL, iostream anything that requires libstdc++ ================================================================== Optimizer ~ ================================================================== 2 The Optimizer Definitions This description uses a narrow definition: The OPTIMISER is the set of routines which decide what execution path the DBMS should take for queries. MySQL changes these routines frequently, so you should compare what is said here with what?s in the current source code. To make that easy, this description includes notes referring to the relevant routine, for example "See: ?/sql/select_cc?, optimize_cond()". When one query is changed into another query which delivers the same result, that is a TRANSFORMATION. For example, the DBMS could change SELECT ... WHERE 5 = a to SELECT ...WHERE a = 5 Most transformations are less obvious. Some transformations result in faster execution. The Code Here is a diagram showing the code structure of handle_select() in ?/sql/sql_select.cc?, the server code that handles a query: handle_select() mysql_select() JOIN::prepare() setup_fields() JOIN::optimize() /* optimizer is from here ... */ optimize_cond() opt_sum_query() make_join_statistics() get_quick_record_count() choose_plan() /* Find the best way to access tables */ /* as specified by the user. */ optimize_straight_join() best_access_path() /* Find a (sub-)optimal plan among all or subset */ /* of all possible query plans where the user */ /* controlls the exhaustiveness of the search. */ greedy_search() best_extension_by_limited_search() best_access_path() /* Perform an exhaustive search for an optimal plan */ find_best() make_join_select() /* ... to here */ JOIN::exec() The indentation in the diagram shows what calls what. Thus you can see that handle_select() calls mysql_select() which calls JOIN::prepare() which calls setup_fields(), and so on. The first part of mysql_select() is JOIN::prepare() which is for context analysis, metadata setup, and some subquery transformations. The optimizer is JOIN::optimize() and all its subordinate routines. When the optimizer finishes, JOIN::exec() takes over and does the job that JOIN::optimize() decides upon. Although the word ?JOIN? appears, these optimizer routines are for all query types. 10 MySQL Internals Manual for version 5.0.1-alpha. The optimize_cond() and opt_sum_query() routines do transformations. The make_join_ statistics() routine puts together all the information it can find about indexes that might be useful for accessing the query?s tables. Constant Propagation A transformation takes place for expressions like this: WHERE column1 = column2 AND column2 = ?x? For such expressions, since it is known that ?if A=B and B=C then A=C? (the Transitivity Law), the transformed condition becomes: WHERE column1=?x? AND column2=?x? This transformation occurs for column1 column2 conditions if and only if is one of these operators: =, <, >, <=, >=, <>, <=>, LIKE That is, transitive transformations don?t apply for BETWEEN. Probably they should not apply for LIKE either, but that?s a story for another day. Constant propagation happens in a loop, so the output from one ?propagation step? can be input for the next step. See: ?/sql/sql_select.cc?, change_cond_ref_to_const(). Or see: ?/sql/sql_select.cc?, propagate_cond_constants(). Dead Code Elimination A transformation takes place for always-true conditions: WHERE 0=0 AND column1=?y? The first condition is always true, so it is removed, leaving: WHERE column1=?y? See: ?/sql/sql_select.cc?, remove_eq_conds(). A transformation takes place for always-false conditions: WHERE (0 = 1 AND s1 = 5) OR s1 = 7 The parenthesized part is always false, so it is removed, reducing the expression above to: WHERE s1 = 7 Sometimes the optimizer might eliminate the whole WHERE clause: WHERE (0 = 1 AND s1 = 5) The EXPLAIN statement will show the words Impossible WHERE. Informally, we at MySQL say: ?The WHERE has been optimized away.? If a column cannot be NULL, the optimizer removes any non-relevant IS NULL conditions. Thus, WHERE not_null_column IS NULL is an always-false situation, and WHERE not_null_column IS NOT NULL is an always-true situation ? so such columns are also eliminated from the conditional expression. This can be tricky. For example, in an OUTER JOIN, a column which is defined as NOT NULL might still contain a NULL. The optimizer leaves IS NULL conditions alone in such exceptional situations. The optimizer will not detect all possible impossible WHERE situations ? there are too many. For example: Chapter 2: The Optimizer 11 CREATE TABLE Table1 (column1 CHAR(1)); ... SELECT * FROM Table1 WHERE column1 = ?Canada?; The optimizer will not eliminate the condition in the query, even though the CREATE TABLE definition makes it an impossible condition. Constant Folding A transformation takes place for this expression: WHERE column1 = 1 + 2 which becomes: WHERE column1 = 3 Before you say ?but I never would write 1 + 2 in the first place? ? remember what was said earlier about constant propagation. It is quite easy for the optimizer to put such expressions together. This process simplifies the result. Constants and Constant Tables A MySQL ?constant? is something more than a mere literal in the query. It can also be the contents of a ?constant table,? which is defined as follows: 1. A table with zero rows, or with only one row 2. A table expression that is restricted with a WHERE condition, containing expressions of the form column = constant, for all the columns of the table?s PRIMARY KEY, or for all the columns of any of the table?s UNIQUE keys (provided that the UNIQUE columns are also defined as NOT NULL). For example, if the table definition for Table0 contains ... PRIMARY KEY (column1,column2) then this expression FROM Table0 ... WHERE column1=5 AND column2=7 ... returns a constant table. More simply, if the table definition for Table1 contains ... unique_not_null_column INT NOT NULL UNIQUE then this expression FROM Table1 ... WHERE unique_not_null_column=5 returns a constant table. These rules mean that a constant table has at most one row value. MySQL will evaluate a constant table in advance, to find out what that value is. Then MySQL will ?plug? that value into the query. Here?s an example: SELECT Table1.unique_not_null_column, Table2.any_column FROM Table1, Table2 WHERE Table1.unique_not_null_column = Table2.any_column AND Table1.unique_not_null_column = 5; When evaluating this query, MySQL first finds that table Table1 ? after restriction with Table1.unique_not_null_column ? is a constant table according to the second definition above. So it retrieves that value. If the retrieval fails (there is no row in the table with unique_not_null_column = 5), then the constant table has zero rows and you will see this message if you run EXPLAIN for the statement: Impossible WHERE noticed after reading const tables Alternatively, if the retrieval succeeds (there is exactly one row in the table with unique_not_ null_column = 5), then the constant table has one row and MySQL transforms the query to this: 12 MySQL Internals Manual for version 5.0.1-alpha. SELECT 5, Table2.any_column FROM Table1, Table2 WHERE 5 = Table2.any_column AND 5 = 5; Actually this is a grand-combination example. The optimizer does some of the transformation because of constant propagation, which we described earlier. By the way, we described constant propagation first because it happens happens before MySQL figures out what the constant tables are. The sequence of optimizer steps sometimes makes a difference. Although many queries have no constant-table references, it should be kept in mind that whenever the word ?constant? is mentioned hereafter, it refers either to a literal or to the contents of a constant table. See: ?/sql/sql_select.cc?, make_join_statistics(). Join Type When evaluating a conditional expression, MySQL decides what ?join type? the expression has. (Again: despite the word ?join,? this applies for all conditional expressions, not just join expressions. A term like ?access type? would be clearer.) These are the documented join types, in order from best to worst: system ... a system table which is a constant table const ... a constant table eq_ref ... unique/primary index with ?=? for joining ref ... index with ?=? ref_or_null ... index with ?=?, possibly NULL range ... index with BETWEEN, IN, >=, LIKE, etc. index ... sequential scan of index ALL ... sequential scan of table See: ?/sql/sql_select.h?, enum join_type{}. Notice that there are a few other (undocumented) join types too, for subqueries. The optimizer can use the join type to pick a ?driver expression.? For example, consider this query: SELECT * FROM Table1 WHERE indexed_column = 5 AND unindexed_column = 6 Since indexed_column has a better join type, it is more likely to be the driver. You?ll see various exceptions as this description proceeds, but this is a simple first rule. What is significant about a driver? Consider that there are two execution paths for the query: (The Bad Execution Path) Read every row in the table. (This is called a ?sequential scan of Table1? or just ?table scan.?) For each row, examine the values in indexed_column and in unindexed_column, to see if they meet the conditions. (The Good Execution Plan) Via the index, look up the rows which have indexed_column = 5. (This is called an ?indexed search.?) For each row, examine the value in unindexed column to see if it meets the condition. An indexed search generally involves fewer accesses than a sequential scan, and far fewer accesses if the table is large but the index is UNIQUE. That is why it is better to access with ?The Good Execution Plan,? and that is why it is often good to choose indexed column as the driver. The ?range? Join Type Some conditions can work with indexes, but over a (possibly wide) range of keys. These are known as ?range? conditions, and are most often encountered with expressions involving these operators: >, >=, <, <=, IN, LIKE, BETWEEN To the optimizer, this expression: Chapter 2: The Optimizer 13 column1 IN (1,2,3) is the same as this one: column1 = 1 OR column1 = 2 OR column1 = 3 and MySQL treats them the same ? there is no need to change IN to OR for a query, or vice versa. The optimizer will use an index (range search) for column1 LIKE ?x%? but not for column1 LIKE ?%x? That is, there is no range search if the first character in the pattern is a wildcard. To the optimizer, column1 BETWEEN 5 AND 7 is the same as this expression column1 >= 5 AND column1 <= 7 and again, MySQL treats both expressions the same. The optimizer may change a Range to an ALL join type if a condition would examine too many index keys. Such a change is particularly likely for < and > conditions and multiple-level secondary indexes. See: (for MyISAM indexes) ?/myisam/mi_range.c?, mi_records_in_range(). The index Join Type Consider this query: SELECT column1 FROM Table1; If column1 is indexed, then the optimizer may choose to retrieve the values from the index rather than from the table. An index which is used this way is called a ?covering index? in most texts. MySQL just uses the word ?index? in EXPLAIN descriptions. For this query: SELECT column1, column2 FROM Table1; the optimizer will use ?join type = index? only if the index has this definition: CREATE INDEX ... ON Table1 (column1, column2); In other words, all columns in the select list must be in the index. (The order of the columns in the index does not matter.) Thus it might make sense to define a multiple-column index strictly for use as a covering index, regardless of search considerations. Transposition MySQL supports transpositions (reversing the order of operands around a relational operator) for simple expressions only. In other words: WHERE - 5 = column1 becomes: WHERE column1 = -5 However, MySQL does not support transpositions where arithmetic exists. Thus: WHERE 5 = -column1 is not treated the same as: WHERE column1 = -5 Transpositions to expressions of the form = are ideal for index lookups. If an expression of this form refers to an indexed column, then MySQL always uses the index, regardless of the table size. (Exception: if the table has only zero rows or only one row, it is a 14 MySQL Internals Manual for version 5.0.1-alpha. constant table and receives special treatment. See the earlier section "Constants and Constant Tables".) AND The ANDed search has the form AND , as in: WHERE column1 = ?x? AND column2 = ?y? Here the optimizer?s decision is: 1. If (neither condition is indexed) use sequential scan. 2. Otherwise, if (one condition has better join type) then pick a driver based on join type (see the earlier section "Join Type"). 3. Otherwise, since (both conditions are indexed and have equal join type) pick a driver based on the first index that was created. Here?s an example: CREATE TABLE Table1 (s1 INT, s2 INT); CREATE INDEX Index1 ON Table1 (s2); CREATE INDEX Index2 ON Table1 (s1); ... SELECT * FROM Table1 WHERE s1 = 5 AND s2 = 5; When choosing a strategy to solve this query, the optimizer picks s2 = 5 as the driver because the index for s2 was created first. Regard this as an accidental effect rather than a rule ? it could change at any moment. OR The ORed search has the form " OR " as in: WHERE column1 = ?x? OR column2 = ?y? Here the optimizer?s decision is: Use a sequential scan. In theory there is another choice if both column1 and column2 are indexed: index search for the first condition, index search for the second condition, merge the two result sets MySQL never does that. But MySQL will do that in future, the code for doing so already exists. The above warning does not apply if the same column is used in both conditions. For example: WHERE column1 = ?x? OR column1 = ?y? In such a case, the search is indexed because the expression is a range search. This subject will be revisited during the discussion of the IN predicate. AND plus OR Consider this search expression: WHERE column1 = 5 AND (column2 = 5 OR column3 = 5) If column1 is indexed, then the optimizer will choose to use the index. In other words, the ?sequential scan if OR? rule does not apply if the OR is subordinate to an AND. UNION All SELECT statements within a UNION are optimized separately. Therefore, for this query: SELECT * FROM Table1 WHERE column1 = ?x? UNION ALL SELECT * FROM TABLE1 WHERE column2 = ?y? Chapter 2: The Optimizer 15 if both column1 and column2 are indexed, then each SELECT is done using an indexed search, and the result sets are merged. Notice that this query might produce the same results as the query used in the OR example, which uses a sequential scan. NOT, <> It is a logical rule that column1 <> 5 is the same as column1 < 5 OR column1 > 5 However, MySQL does not transform in this circumstance. If you think that a range search would be better, then you should do your own transforming in such cases. It is also a logical rule that WHERE NOT (column1 != 5) is the same as WHERE column1 = 5 However, MySQL does not transform in this circumstance either. We expect to add optimizations soon for both the above cases. ORDER BY In general, the optimizer will skip the sort procedure for the ORDER BY clause if it sees that the rows will be in order anyway. But let?s examine some exceptional situations. For the query: SELECT column1 FROM Table1 ORDER BY ?x?; the optimizer will throw out the ORDER BY clause. This is another example of dead code elimination. For the query: SELECT column1 FROM Table1 ORDER BY column1; the optimizer will use an index on column1, if it exists. For the query: SELECT column1 FROM Table1 ORDER BY column1+1; the optimizer will use an index on column1, if it exists. But don?t let that fool you! The index is only for finding the values. (It?s cheaper to do a sequential scan of the index than a sequential scan of the table, that?s why index is a better join type than ALL ? see "The ?index? Join Type" section, earlier.) There will still be a full sort of the results. For the query: SELECT * FROM Table1 WHERE column1 > ?x? AND column2 > ?x? ORDER BY column2; if both column1 and column2 are indexed, the optimizer will choose an index on ... column1. The fact that ordering takes place by column2 values does not affect the choice of driver in this case. See: ?/sql/sql_select.cc?, test_if_order_by_key(), and ?/sql/sql_select.cc?, test_if_ skip_sort_order(). There is a description of the internal sort procedure in the MySQL Reference Manual, in section 5.2.8 ?How MySQL optimizes ORDER BY.? We will not repeat it here, but urge you to read it because it includes a description of how the buffering and the quicksort operate. See: ?/sql/sql_select.cc?, create_sort_index(). 16 MySQL Internals Manual for version 5.0.1-alpha. GROUP BY These are the main optimizations that take place for GROUP BY and related items (HAVING, COUNT(), MAX(), MIN(), SUM(), AVG(), DISTINCT()). ? GROUP BY will use an index, if one exists. ? GROUP BY will use sorting, if there is no index. The optimizer may choose to use a hash table. ? For the case GROUP BY x ORDER BY x, the optimizer will realize that the ORDER BY is unnecessary, because the GROUP BY comes out in order by x. ? The optimizer contains code for shifting certain HAVING conditions to the WHERE clause; however, this code is not operative at time of writing. See: ?/sql/sql_select.cc?, JOIN::optimize(), after #ifdef HAVE_REF_TO_FIELDS. ? If the table handler has a quick row-count available, then the query SELECT COUNT(*) FROM Table1; gets the count without going through all the rows. This is true for MyISAM tables, but not for InnoDB tables. Note that the query SELECT COUNT(column1) FROM Table1; is not subject to the same optimization, unless column1 is defined as NOT NULL. ? New optimizations exist for MAX() and MIN(). For example, consider the query SELECT MAX(column1) FROM Table1 WHERE column1 < ?a?; If column1 is indexed, then it?s easy to find the highest value by looking for ?a? in the index and going back to the key before that. ? The optimizer transforms queries of the form SELECT DISTINCT column1 FROM Table1; to SELECT column1 FROM Table1 GROUP BY column1; if and only if both of these conditions are true: ? The GROUP BY can be done with an index. (This implies that there is only one table in the FROM clause, and no WHERE clause.) ? There is no LIMIT clause. Because DISTINCT is not always transformed to GROUP BY, do not expect that queries with DISTINCT will always cause ordered result sets. (You can, however, rely on that rule with GROUP BY, unless the query includes ORDER BY NULL.) See: ?/sql/sql_select.cc?, opt_sum_query(), and ?/sql/sql_select.cc?, remove_ duplicates(). JOIN Bad join choices can cause more damage than bad choices in single-table searches, so MySQL developers have spent proportionally more time making sure that the tables in a query are joined in an optimal order and that optimal access methods (often called ?access paths?) are chosen to retrieve table data. A combination of a fixed order in which tables are joined and the corresponding table access methods for each table is called ?query execution plan? (QEP). The goal of the query optimizer is to find an optimal QEP among all possible such plans. There are several general ideas behind join optimization. Each plan (or part of plan) is assigned a ?cost.? The cost of a plan reflects roughly the resources needed to compute a query according to the plan, where the main factor is the number of rows Chapter 2: The Optimizer 17 that will be accessed while computing a query. Once we have a way to assign costs to different QEPs we have a way to compare them. Thus, the goal of the optimizer is to find a QEP with minimal cost among all possible plans. In MySQL, the search for an optimal QEP is performed in a bottom-up manner. The optimizer first considers all plans for one table, then all plans for two tables, and so on, until it builds a complete optimal QEP. Query plans that consist of only some of the tables (and predicates) in a query are called ?partial plans.? The optimizer relies on the fact that the more tables are added to a partial plan, the greater its cost. This allows the optimizer to expand with more tables only the partial plans with lower cost than the current best complete plan. The key routine that performs the search for an optimal QEP is ?sql/sql_select.cc?, find_ best(). It performs an exhaustive search of all possible plans and thus guarantees it will find an optimal one. Below we represent find_best() in an extremely free translation to pseudocode. It is recursive, so some input variables are labeled ?so far? to indicate that they come from a previous iteration. remaining_tables = {t1, ..., tn}; /* all tables referenced in a query */ procedure find_best( partial_plan in, /* in, partial plan of tables-joined-so-far */ partial_plan_cost, /* in, cost of partial_plan */ remaining_tables, /* in, set of tables not referenced in partial_plan */ best_plan_so_far, /* in/out, best plan found so far */ best_plan_so_far_cost)/* in/out, cost of best_plan_so_far */ { for each table T from remaining_tables { /* Calculate the cost of using table T. Factors that the optimizer takes into account may include: Many rows in table (bad) Many key parts in common with tables so far (very good) Restriction mentioned in the WHERE clause (good) Long key (good) Unique or primary key (good) Full-text key (bad) Other factors that may at some time be worth considering: Many columns in key Short average/maximum key length Small table file Few levels in index All ORDER BY / GROUP columns come from this table */ cost = complex-series-of-calculations; /* Add the cost to the cost so far. */ partial_plan_cost+= cost; if (partial_plan_cost >= best_plan_so_far_cost) /* partial_plan_cost already too great, stop search */ continue; partial_plan= expand partial_plan by best_access_method; remaining_tables= remaining_tables - table T; if (remaining_tables is not an empty set) { 18 MySQL Internals Manual for version 5.0.1-alpha. find_best(partial_plan, partial_plan_cost, remaining_tables, best_plan_so_far, best_plan_so_far_cost); } else { best_plan_so_far_cost= partial_plan_cost; best_plan_so_far= partial_plan; } } } Here the optimizer applies a ?depth-first search algorithm.? It tries estimates for every table in the FROM clause. It will stop a search early if the estimate becomes worse than the best estimate so far. The order of scanning will depend on the order that the tables appear in the FROM clause. See: ?/sql/table.h?, struct st_table. ANALYZE TABLE may affect some of the factors that the optimizer considers. See also: ?/sql/sql_sqlect.cc?, make_join_statistics(). The straightforward use of find_best() and greedy_search() will not apply for LEFT JOIN or RIGHT JOIN. For example, starting with MySQL 4.0.14, the optimizer may change a left join to a straight join and swap the table order in some cases. See also ?5.2.7 How MySQL Optimises LEFT JOIN and RIGHT JOIN? in the MySQL Reference Manual. 2.1 The Index Merge Join Type 2.1.1 Overview Index Merge is used when table condition can be converted to form: cond_1 OR cond_2 ... OR cond_N The conditions for conversion are that each cond_i can be used for a range scan, and no pair (cond_i, cond_j) uses the same index. (If cond_i and cond_j use the same index, then cond_i OR cond_j can be combined into a single range scan and no merging is necessary.) For example, Index Merge can be used for the following queries: SELECT * FROM t WHERE key1=c1 OR key2= max_val. MySQL is able to convert arbitrary-depth nested AND-OR conditions to the above conjunctive form. 2.1.2.2 Index Merge Optimizer A single SEL_TREE object cannot be constructed for conditions that have different members of keys in the OR clause, like in condition: key1 < c1 OR key2 < c2 Beginning with MySQL 5.0, these conditions are handled with the Index Merge method, and its range optimizer structure, class SEL_IMERGE. SEL_IMERGE represents a disjunction of several SEL_TREE objects, which can be expressed as: sel_imerge_cond = (t_1 OR t_1 OR ... OR t_n) where each of t_i stands for a SEL_TREE object, and no pair (t_i, t_j) of distinct SEL_TREE objects can be combined into single SEL_TREE object. The current implementation builds SEL_IMERGE only if no single SEL_TREE object can be built for the part of the query condition it has analyzed, and discards SEL_TREE immediately if it discovers that a single SEL_TREE object can be constructed. This is actually a limitation, and can cause worse row retrieval strategy to be used. E.g. for query: SELECT * FROM t WHERE (goodkey1=c1 OR goodkey1=c2) AND badkey=c3 scan on badkey will be chosen even if Index Merge on (goodkey1, goodkey) would be faster. The Index Merge optimizer collects a list of possible ways to access rows with Index Merge. This list of SEL_IMERGE structures represents the following condition: (t_11 OR t_12 OR ... OR t_1k) AND (t_21 OR t_22 OR ... OR t_2l) AND ... AND (t_M1 OR t_M2 OR ... OR t_mp) where t_ij is one SEL_TREE and one line is for one SEL_IMERGE object. The SEL_IMERGE object with minimal cost is used for row retrieval. In ?sql/opt_range.cc?, see imerge_list_and_list(), imerge_list_or_list(), and SEL_ IMERGE class member functions for more details of Index Merge construction. 20 MySQL Internals Manual for version 5.0.1-alpha. See the get_index_merge_params function in the same file for Index Merge cost calculation algorithm. 2.1.3 Row Retrieval Algorithm Index Merge works in two steps: Preparation step: activate ?index only?; foreach key_i in (key_scans \ clustered_pk_scan) { while (retrieve next (key, rowid) pair from key_i) { if (no clustered PK scan || row doesn?t match clustered PK scan condition) put rowid into Unique; } } deactivate ?index only?; Row retrieval step: for each rowid in Unique { retrieve row and pass it to output; } if (clustered_pk_scan) { while (retrieve next row for clustered_pk_scan) pass row to output; } See: ?sql/opt_range.cc?, QUICK_INDEX_MERGE_SELECT class members for Index Merge row retrieval code. Chapter 3: Important Algorithms and Structures 21 3 Important Algorithms and Structures MySQL uses many different algorithms and structures. This chapter tries to describe some of them. 3.1 How MySQL Does Sorting (filesort) In those cases where MySQL must sort the result, it uses the following filesort algorithm before MySQL 4.1: 1. Read all rows according to key or by table scanning. Rows that don?t match the WHERE clause are skipped. 2. For each row, store a pair of values in a buffer (the sort key and the row pointer). The size of the buffer is the value of the sort_buffer_size system variable. 3. When the buffer gets full, run a qsort (quicksort) on it and store the result in a temporary file. Save a pointer to the sorted block. (If all pairs fit into the sort buffer, no temporary file is created.) 4. Repeat the preceding steps until all rows have been read. 5. Do a multi-merge of up to MERGEBUFF (7) regions to one block in another temporary file. Repeat until all blocks from the first file are in the second file. 6. Repeat the following until there are fewer than MERGEBUFF2 (15) blocks left. 7. On the last multi-merge, only the pointer to the row (the last part of the sort key) is written to a result file. 8. Read the rows in sorted order by using the row pointers in the result file. To optimize this, we read in a big block of row pointers, sort them, and use them to read the rows in sorted order into a row buffer. The size of the buffer is the value of the read_rnd_buffer_size system variable. The code for this step is in the ?sql/records.cc? source file. One problem with this approach is that it reads rows twice: One time when evaluating the WHERE clause, and again after sorting the pair values. And even if the rows were accessed successively the first time (for example, if a table scan is done), the second time they are accessed randomly. (The sort keys are ordered, but the row positions are not.) In MySQL 4.1 and up, a filesort optimization is used that records not only the sort key value and row position, but also the columns required for the query. This avoids reading the rows twice. The modified filesort algorithm works like this: 1. Read the rows that match the WHERE clause, as before. 2. For each row, record a tuple of values consisting of the sort key value and row position, and also the columns required for the query. 3. Sort the tuples by sort key value 4. Retrieve the rows in sorted order, but read the required columns directly from the sorted tuples rather than by accessing the table a second time. Using the modified filesort algorithm, the tuples are longer than the pairs used in the original method, and fewer of them fit in the sort buffer (the size of which is given by sort_buffer_ size). As a result, it is possible for the extra I/O to make the modified approach slower, not faster. To avoid a slowdown, the optimization is used only if the total size of the extra columns in the sort tuple does not exceed the value of the max_length_for_sort_data system variable. (A symptom of setting the value of this variable too high is that you will see high disk activity and low CPU activity.) 22 MySQL Internals Manual for version 5.0.1-alpha. 3.2 Bulk Insert The logic behind bulk insert optimization is simple. Instead of writing each key value to B-tree (that is, to the key cache, although the bulk insert code doesn?t know about the key cache), we store keys in a balanced binary (red-black) tree, in memory. When this tree reaches its memory limit, we write all keys to disk (to key cache, that is). But since the key stream coming from the binary tree is already sorted, inserting goes much faster, all the necessary pages are already in cache, disk access is minimized, and so forth. 3.3 How MySQL Does Caching MySQL has the following caches. (Note that the some of the filenames contain an incorrect spelling of the word ?cache.?) Key Cache A shared cache for all B-tree index blocks in the different NISAM files. Uses hashing and reverse linked lists for quick caching of the most recently used blocks and quick flushing of changed entries for a specific table. (?mysys/mf_keycash.c?) Record Cache This is used for quick scanning of all records in a table. (?mysys/mf_iocash.c? and ?isam/_cash.c?) Table Cache This holds the most recently used tables. (?sql/sql_base.cc?) Hostname Cache For quick lookup (with reverse name resolving). This is a must when you have a slow DNS. (?sql/hostname.cc?) Privilege Cache To allow quick change between databases, the last used privileges are cached for each user/database combination. (?sql/sql_acl.cc?) Heap Table Cache Many uses of GROUP BY or DISTINCT cache all found rows in a HEAP table. (This is a very quick in-memory table with hash index.) Join Buffer Cache For every ?full join? in a SELECT statement the rows found are cached in a join cache. (A ?full join? here means there were no keys that could be used to find rows for the next table in the list.) In the worst case, one SELECT query can use many join caches. 3.4 How MySQL Uses the Join Buffer Cache Basic information about the join buffer cache: ? The size of each join buffer is determined by the value of the join_buffer_size system variable. ? This buffer is used only when the join is of type ALL or index (in other words, when no possible keys can be used). ? A join buffer is never allocated for the first non-const table, even if it would be of type ALL or index. Chapter 3: Important Algorithms and Structures 23 ? The buffer is allocated when we need to do a full join between two tables, and freed after the query is done. ? Accepted row combinations of tables before the ALL/index are stored in the cache and are used to compare against each read row in the ALL table. ? We only store the used columns in the join buffer, not the whole rows. Assume you have the following join: Table name Type t1 range t2 ref t3 ALL The join is then done as follows: - While rows in t1 matching range - Read through all rows in t2 according to reference key - Store used fields from t1, t2 in cache - If cache is full - Read through all rows in t3 - Compare t3 row against all t1, t2 combinations in cache - If row satisfies join condition, send it to client - Empty cache - Read through all rows in t3 - Compare t3 row against all stored t1, t2 combinations in cache - If row satisfies join condition, send it to client The preceding description means that the number of times table t3 is scanned is determined as follows: S = size-of-stored-row(t1,t2) C = accepted-row-combinations(t1,t2) scans = (S * C)/join_buffer_size + 1 Some conclusions: ? The larger the value of join_buffer_size, the fewer the scans of t3. If join_buffer_size is already large enough to hold all previous row combinations, there is no speed to be gained by making it larger. ? If there are several tables of join type ALL or index, then we allocate one buffer of size join_buffer_size for each of them and use the same algorithm described above to handle it. (In other words, we store the same row combination several times into different buffers.) 3.5 How MySQL Handles FLUSH TABLES ? FLUSH TABLES is handled in ?sql/sql_base.cc::close_cached_tables()?. ? The idea of FLUSH TABLES is to force all tables to be closed. This is mainly to ensure that if someone adds a new table outside of MySQL (for example, by copying files into a database directory with cp), all threads will start using the new table. This will also ensure that all table changes are flushed to disk (but of course not as optimally as simply calling a sync for all tables)! ? When you do a FLUSH TABLES, the variable refresh_version is incremented. Every time a thread releases a table, it checks if the refresh version of the table (updated at open) is the same as the current refresh_version. If not, it will close it and broadcast a signal on COND_refresh (to await any thread that is waiting for all instances of a table to be closed). 24 MySQL Internals Manual for version 5.0.1-alpha. ? The current refresh_version is also compared to the open refresh_version after a thread gets a lock on a table. If the refresh version is different, the thread will free all locks, reopen the table and try to get the locks again. This is just to quickly get all tables to use the newest version. This is handled by ?sql/lock.cc::mysql_lock_tables()? and ?sql/sql_base.cc::wait_for_tables()?. ? When all tables have been closed, FLUSH TABLES returns an okay to the client. ? If the thread that is doing FLUSH TABLES has a lock on some tables, it will first close the locked tables, then wait until all other threads have also closed them, and then reopen them and get the locks. After this it will give other threads a chance to open the same tables. 3.6 Full-text Search in MySQL Hopefully, sometime there will be complete description of full-text search algorithms. For now, it?s just unsorted notes. Weighting in boolean mode The basic idea is as follows: In an expression of the form A or B or (C and D and E), either A or B alone is enough to match the whole expression, whereas C, D, and E should all match. So it?s reasonable to assign weight 1 to each of A, B, and (C and D and E). Furthermore, C, D, and E each should get a weight of 1/3. Things become more complicated when considering boolean operators, as used in MySQL fulltext boolean searching. Obviously, +A +B should be treated as A and B, and A B - as A or B. The problem is that +A B can not be rewritten in and/or terms (that?s the reason why this? extended?set of operators was chosen). Still, aproximations can be used. +A B C can be approximated as A or (A and (B or C)) or as A or (A and B) or (A and C) or (A and B and C). Applying the above logic (and omitting mathematical transformations and normalization) one gets that for +A_1 +A_2 ... +A_N B_1 B_2 ... B_M the weights should be: A_i = 1/N, B_j=1 if N==0, and, otherwise, in the first rewriting approach B_j = 1/3, and in the second one - B_j = (1+(M-1)*2^M)/(M*(2^(M+1)-1)). The second expression gives a somewhat steeper increase in total weight as number of matched B_j values increases, because it assigns higher weights to individual B_j values. Also, the first expression is much simpler, so it is the first one that is implemented in MySQL. 3.7 Functions in the mysys Library Functions in mysys: (For flags see ?my_sys.h?) int my_copy _A((const char *from, const char *to, myf MyFlags)); Copy file from from to to. int my_rename _A((const char *from, const char *to, myf MyFlags)); Rename file from from to to. int my_delete _A((const char *name, myf MyFlags)); Delete file name. int my_redel _A((const char *from, const char *to, int MyFlags)); Delete from before rename of to to from. Copies state from old file to new file. If MY_COPY_TIME is set, sets old time. int my_getwd _A((string buf, uint size, myf MyFlags)); int my_setwd _A((const char *dir, myf MyFlags)); Get and set working directory. Chapter 3: Important Algorithms and Structures 25 string my_tempnam _A((const char *dir, const char *pfx, myf MyFlags)); Make a unique temporary file name by using dir and adding something after pfx to make the name unique. The file name is made by adding a unique six character string and TMP_EXT after pfx. Returns pointer to malloc()?ed area for filename. Should be freed by free(). File my_open _A((const char *FileName,int Flags,myf MyFlags)); File my_create _A((const char *FileName, int CreateFlags, int AccsesFlags, myf MyFlags)); int my_close _A((File Filedes, myf MyFlags)); uint my_read _A((File Filedes, byte *Buffer, uint Count, myf MyFlags)); uint my_write _A((File Filedes, const byte *Buffer, uint Count, myf MyFlags)); ulong my_seek _A((File fd,ulong pos,int whence,myf MyFlags)); ulong my_tell _A((File fd,myf MyFlags)); Use instead of open, open-with-create-flag, close, read, and write to get automatic error messages (flag MYF_WME) and only have to test for != 0 if error (flag MY_NABP). FILE *my_fopen _A((const char *FileName,int Flags,myf MyFlags)); FILE *my_fdopen _A((File Filedes,int Flags,myf MyFlags)); int my_fclose _A((FILE *fd,myf MyFlags)); uint my_fread _A((FILE *stream,byte *Buffer,uint Count,myf MyFlags)); uint my_fwrite _A((FILE *stream,const byte *Buffer,uint Count, myf MyFlags)); ulong my_fseek _A((FILE *stream,ulong pos,int whence,myf MyFlags)); ulong my_ftell _A((FILE *stream,myf MyFlags)); Same read-interface for streams as for files. gptr _mymalloc _A((uint uSize,const char *sFile,uint uLine, myf MyFlag)); gptr _myrealloc _A((string pPtr,uint uSize,const char *sFile,uint uLine, myf MyFlag)); void _myfree _A((gptr pPtr,const char *sFile,uint uLine)); int _sanity _A((const char *sFile,unsigned int uLine)); gptr _myget_copy_of_memory _A((const byte *from,uint length,const char *sFile, uint uLine,myf MyFlag)); malloc(size,myflag) is mapped to these functions if not compiled with -DSAFEMALLOC. void TERMINATE _A((void)); Writes malloc() info on stdout if compiled with -DSAFEMALLOC. int my_chsize _A((File fd, ulong newlength, myf MyFlags)); Change size of file fd to newlength. void my_error _D((int nr, myf MyFlags, ...)); Writes message using error number (see ?mysys/errors.h?) on stdout, or using curses, if MYSYS_PROGRAM_USES_CURSES() has been called. void my_message _A((const char *str, myf MyFlags)); Writes str on stdout, or using curses, if MYSYS_PROGRAM_USES_CURSES() has been called. void my_init _A((void )); Start each program (in main()) with this. void my_end _A((int infoflag)); Gives info about program. If infoflag & MY_CHECK_ERROR, prints if some files are left open. If infoflag & MY_GIVE_INFO, prints timing info and malloc() info about program. 26 MySQL Internals Manual for version 5.0.1-alpha. int my_copystat _A((const char *from, const char *to, int MyFlags)); Copy state from old file to new file. If MY_COPY_TIME is set, sets old time. string my_filename _A((File fd)); Returns filename of open file. int dirname _A((string to, const char *name)); Copy name of directory from filename. int test_if_hard_path _A((const char *dir_name)); Test if dir_name is a hard path (starts from root). void convert_dirname _A((string name)); Convert dirname according to system. On Windows, changes all characters to capitals and changes ?/? to ?\?. string fn_ext _A((const char *name)); Returns pointer to extension in filename. string fn_format _A((string to,const char *name,const char *dsk,const char *form,int flag)); Format a filename with replacement of library and extension and convert between different systems. The to and name parameters may be identical. Function doesn?t change name if name != to. flag may be: 1 Force replace filnames library with ?dsk? 2 Force replace extension with ?form? */ 4 Force unpack filename (replace ~ with home directory) 8 Pack filename as short as possible for output to user All open requests should always use at least open(fn_format(temp_buffer, name, "", "", 4), ...) to unpack home and convert filename to system-form. string fn_same _A((string toname, const char *name, int flag)); Copies directory and extension from name to toname if needed. Copying can be forced by same flags used in fn_format(). int wild_compare _A((const char *str, const char *wildstr)); Compare if str matches wildstr. wildstr can contain ?*? and ??? as wildcard characters. Returns 0 if str and wildstr match. void get_date _A((string to, int timeflag)); Get current date in a form ready for printing. void soundex _A((string out_pntr, string in_pntr)) Makes in_pntr to a 5 char long string. All words that sound alike have the same string. int init_key_cache _A((ulong use_mem, ulong leave_this_much_mem)); Use caching of keys in MISAM, PISAM, and ISAM. KEY_CACHE_SIZE is a good size. Remember to lock databases for optimal caching. void end_key_cache _A((void)); End key caching. Chapter 4: Charsets and Related Issues 27 4 Charsets and Related Issues 4.1 CHARSET_INFO Structure 28 MySQL Internals Manual for version 5.0.1-alpha. Chapter 5: How MySQL Performs Different Selects 29 5 How MySQL Performs Different Selects 5.1 Steps of Select Execution Every select is performed in these base steps: ? JOIN::prepare ? Initialization and linking JOIN structure to st_select_lex. ? fix_fields() for all items (after fix_fields(), we know everything about item). ? Moving HAVING to WHERE if possible. ? Initialization procedure if there is one. ? JOIN::optimize ? Single select optimization. ? Creation of first temporary table if needed. ? JOIN::exec ? Performing select (a second temporary table may be created). ? JOIN::cleanup ? Removing all temporary tables, other cleanup. ? JOIN::reinit ? Prepare all structures for execution of SELECT (with JOIN::exec). 5.2 select_result Class This class has a very important role in SELECT performance with select_result class and classes inherited from it (usually called with a select_ prefix). This class provides the interface for transmitting results. The key methods in this class are the following: ? send_fields sends given item list headers (type, name, etc.). ? send_data sends given item list values as row of table of result. ? send_error is used mainly for error interception, making some operation and then ::send_ error will be called. For example, there are the following select_result classes: ? select_send used for sending results though network layer. ? select_export used for exporting data to file. ? multi_delete used for multi-delete. ? select_insert used for INSERT ... SELECT ... ? multi_update used for multi-update. ? select_singlerow_subselect used for row and scalar subqueries.. ? select_exists_subselect used for EXISTS/IN/ALL/ANY/SOME subqueries. ? select_max_min_finder_subselect used for min/max subqueries (ALL/ANY subquery optimization). 30 MySQL Internals Manual for version 5.0.1-alpha. 5.3 SIMPLE or PRIMARY SELECT For performing single primary select, SELECT uses the mysql_select function, which does: ? allocate JOIN ? JOIN::prepare ? JOIN::optimize ? JOIN::exec ? JOIN::cleanup In previous versions of MySQL, all SELECT operations were performed with the help of this function and mysql_select() was not divided into parts. 5.4 Structure Of Complex Select There are two structures that describe selects: ? st_select_lex (SELECT_LEX) for representing SELECT itself ? st_select_lex_unit (SELECT_LEX_UNIT) for grouping several selects in a bunch The latter item represents UNION operation (the absence of UNION is a union with only one SELECT and this structure is present in any case). In the future, this structure will be used for EXCEPT and INTERSECT as well. For example: (SELECT ...) UNION (SELECT ... (SELECT...)...(SELECT...UNION...SELECT)) 1 2 3 4 5 6 7 will be represented as: ------------------------------------------------------------------------ level 1 SELECT_LEX_UNIT(2) | +---------------+ | | SELECT_LEX(1) SELECT_LEX(3) | --------------- | ------------------------------------------------------ | level 2 +-------------------+ | | SELECT_LEX_UNIT(4) SELECT_LEX_UNIT(6) | | | +--------------+ | | | SELECT_LEX(4) SELECT_LEX(5) SELECT_LEX(7) ------------------------------------------------------------------------ Note: Single subquery 4 has its own SELECT_LEX_UNIT. The uppermost SELECT_LEX_UNIT (#2 in example) is stored in LEX. The first and uppermost SELECT_LEX (#1 in example) is stored in LEX, too. These two structures always exist. At the time of creating or performing any JOIN::* operation, LEX::current_select points to an appropriate SELECT_LEX. Chapter 5: How MySQL Performs Different Selects 31 Only during parsing of global ORDER BY and LIMIT clauses (for the whole UNION), LEX::current_ select points to SELECT_LEX_UNIT of this unit, in order to store this parameter in this SELECT_ LEX_UNIT. SELECT_LEX and SELECT_LEX_UNIT are inherited from st_select_lex_node. 5.5 Non-Subquery UNION Execution Non-subquery unions are performed with the help of mysql_union(). For now, it is divided into the following steps: ? st_select_lex_unit::prepare (the same procedure can be called for single SELECT for derived table => we have support for it in this procedure, but we will not describe it here): ? Create select_union (inherited from select_result) which will write select results in this temporary table, with empty temporary table entry. We will need this object to store in every JOIN structure link on it, but we have not (yet) temporary table structure. ? Allocate JOIN structures and execute JOIN::prepare() for every SELECT to get full information about types of elements of SELECT list (results). Merging types of result fields and storing them in special Items (Item_type_holder) will be done in this loop, too. Result of this operation (list of types of result fields) will be stored in st_select_ lex_unit::types). ? Create a temporary table for storing union results (if UNION without ALL option, ?distinct? parameter will be passed to the table creation procedure). ? Assign a temporary table to the select_union object created in the first step. ? st_select_lex_unit::exec ? Delete rows from the temporary table if this is not the first call. ? if this is the first call, call JOIN::optimize else JOIN::reinit and then JOIN::exec for all SELECTs (select_union will write a result for the temporary table). If union is cacheable and this is not the first call, the method will do nothing. ? Call mysql_select on temporary table with global ORDER BY and LIMIT parameters after collecting results from all SELECTs. A special fake_select_lex (SELECT_LEX) which is created for every UNION will be passed for this procedure (this SELECT_LEX also can be used to store global ORDER BY and LIMIT parameters if brackets used in a query). 5.6 Derived Table Execution ?Derived tables? is the internal name for subqueries in the FROM clause. The processing of derived tables is now included in the table opening process (open_and_lock_ tables() call). Routine of execution derived tables and substituting temporary table instead of it (mysql_handle_derived()) will be called just after opening and locking all real tables used in query (including tables used in derived table query). If lex->derived_tables flag is present, all SELECT_LEX structures will be scanned (there is a list of all SELECT_LEX structures in reverse order named lex->all_selects_list, the first SELECT in the query will be last in this list). There is a pointer for the derived table, SELECT_LEX_UNIT stored in the TABLE_LIST structure (TABLE_LIST::derived). For any table that has this pointer, mysql_derived() will be called. mysql_derived(): ? Creates union_result for writing results in this table (with empty table entry, same as for UNIONs). 32 MySQL Internals Manual for version 5.0.1-alpha. ? call unit->prepare() to get list of types of result fields (it work correctly for single SELECT, and do not create temporary table for UNION processing in this case). ? Creates a temporary table for storing results. ? Assign this temporary table to union_result object. ? Calls mysql_select or mysql_union to execute the query. ? If it is not explain, then cleanup JOIN structures after execution (EXPLAIN needs data of optimization phase and cleanup them after whole query processing). ? Stores pointer to this temporary table in TABLE_LIST structure, then this table will be used by outer query. ? Links this temporary table in thd->derived_tables for removing after query execution. This table will be closed in close_thread_tables if its second parameter (bool skip_ derived) is true. 5.7 Subqueries In expressions, subqueries (that is, subselects) are represented by Item inherited from Item_ subselect. To hide difference in performing single SELECTs and UNIONs, Item_subselect uses two different engines, which provide uniform interface for access to underlying SELECT or UNION (subselect_ single_select_engine and subselect_union_engine, both are inherited from subselect_ engine). The engine will be created at the time Item_select is constructed (Item_subselect::init method). On Item_subselect::fix_fields(), engine->prepare() will be called. Before calling any value-getting method (val, val_int, val_str, bring_value (in case of row result)) engine->exec() will be called, which executes the query or just does nothing if subquery is cacheable and has already been executed. Inherited items have their own select result classes. There are two types of them: ? select_singlerow_subselect, to store values of given rows in Item_singlerow_ subselect cache on send_data() call, and report error if Item_subselect has ?assigned? attribute. ? select_exists_subselect just store 1 as value of Item_exists_subselect on send_ data() call. Since Item_in_subselect and Item_allany_subselect are inherited from Item_exists_subselect, they use the same select_result class. Item_select will never call the cleanup() procedure for JOIN. Every JOIN::cleanup will call cleanup() for inner JOINs. The uppermost JOIN::cleanup will be called by mysql_select() or mysql_union(). 5.8 Single Select Engine subselect single select engine: ? constructor allocate JOIN and store pointers on SELECT_LEX and JOIN. ? prepare() call JOIN::prepare. ? fix_length_and_dec() prepare cache and receive type and parameters of returning items (called only by Item_singlerow_subselect). ? exec() drop ?assigned? flag of Item_subselect. If this is the first time, call JOIN::optimize and JOINexec(), else do nothing or JOIN::reinit() JOIN::exec() depending on type of subquery. Chapter 5: How MySQL Performs Different Selects 33 5.9 Union Engine subselect_union_engine: ? constructor just store pointer to st_select_lex_union (SELECT_LEX_UNION). ? prepare() call st_select_lex_unit::prepare. ? fix_length_and_dec() prepare cache and receive type and parameters (maximum of length) of returning items (called only by Item_singlerow_subselect). ? exec() call st_select_lex_unit::exec(). st_select_lex_unit::exec() can drop ?assigned? flag of Item_subselect if st_select_lex_unit::item is not 0. 5.10 Special Engines There are special engines used for optimization purposes. These engines do not have a full range of features. They can only fetch data. The normal engine can be replaced with such special engines only during the optimization process. Now we have two such engines: ? subselect_uniquesubquery_engine used for: left_expression IN (SELECT primary_key FROM table WHERE conditions) This looks for the given value once in a primary index, checks the WHERE condition, and returns ?was it found or not?? ? subselect_indexsubquery_engine used for: left_expression IN (SELECT any_key FROM table WHERE conditions) This first looks up the value of the left expression in an index (checking the WHERE condition), then if value was not found, it checks for NULL values so that it can return NULL correctly (only if a NULL result makes sense, for example if an IN subquery is the top item of the WHERE clause then NULL will not be sought) The decision about replacement of the engine happens in JOIN::optimize, after calling make_ join_readinfo, when we know what the best index choice is. 5.11 Explain Execution For an EXPLAIN statement, for every SELECT, mysql_select will be called with option SELECT_ DESCRIBE. For main UNION, mysql_explain_union will be called. For every SELECT in a given union, mysql_explain_union will call mysql_explain_select. mysql_explain_select will call mysql_select with option SELECT_DESCRIBE. mysql_select creates a JOIN for select if it does not already exist (it might already exist because if it called for subselect JOIN can be created in JOIN::optimize of outer query when it decided to calculate the value of the subquery). Then it calls JOIN::prepare, JOIN::optimize, JOIN::exec and JOIN::cleanup as usual. JOIN::exec is called for SELECT with SELECT_DESCRIBE option call select_describe. select_describe returns the user description of SELECT and calls mysql_explain_union for every inner UNION. PROBLEM: how it will work with global query optimization? 34 MySQL Internals Manual for version 5.0.1-alpha. Chapter 6: How MySQL Transforms Subqueries 35 6 How MySQL Transforms Subqueries Item_subselect virtual method select_transformer is used to rewrite subqueries. It is called from Item_subselect::init (which is called just after call to fix_fields() method for all items in JOIN::prepare). 6.1 Item_in_subselect::select_transformer Item_in_subselect::select_transformer is divided into two parts, for the scalar left part and the row left part. 6.1.1 Scalar IN Subquery To rewrite a scalar IN subquery, the method used is Item_in_subselect::single_value_ transformer. Scalar IN subquery will be replaced with Item_in_optimizer. Item_in_optimizer item is a special boolean function. On a value request (one of val, val_ int, or val_str methods) it evaluates left expression of IN by storing its value in cache item (one of Item_cache* items), then it tests the cache to see whether it is NULL. If left expression (cache) is NULL, then Item_in_optimizer returns NULL, else it evaluates Item_in_subselect. Example queries. a) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2); b) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2 GROUP BY t2.a); ? Item_in_subselect inherits the mechanism for getting a value from Item_exists_ subselect. ? Select_transformer stores a reference to the left expression in its conditions: (in WHERE and HAVING in case ?a? and in HAVING in case ?b?) ? Item from item list of this select (t2.a) can be referenced with a special reference (Item_ ref_null_helper or Item_null_helper). This reference informs Item_in_optimizer whether item (t2.a) is NULL by setting the ?was null? flag. ? The return value from Item_in_subselect will be evaluated as follows: ? If TRUE, return true ? If NULL, return null (that is, unknown) ? If FALSE, and ?was null? is set, return null ? Return FALSE IN (SELECT ...) will be represented as follows: +-----------------+ |Item_in_optimizer| +-----------------+ | +---------------------+------------+ | | +-----------------------+ +-----------------+ | | |Item_in_subselect| | | +-----------------+ +-----------------------+ | || +-----------+-----------+ | | | | +-----------------------+ | | 36 MySQL Internals Manual for version 5.0.1-alpha. ^ +----------+ +--------------------+ +<<<<<<<<<<<<<<<<<| Item_ref | +<<<|Item_ref_null_helper| +----------+ V +--------------------+ V +--------------------+ +>>>| | +--------------------+ where <<<<<<<<< is reference in meaning of Item_ref. Item_ref is used to point to , because at the time of transformation we know only the address of variable where the cache pointer will be stored. If the select statement has an ORDER BY clause, it will be wiped out, because there is no sense in ORDER BY without LIMIT here. If IN subquery union, the condition of every select in the UNION will be changed individually. If a condition needs to be added to the WHERE clause, it will be presented as (item OR item IS NULL) and Item_is_not_null_test(item) will be added to the HAVING clause. Item_is_not_ null_test registers NULL value the way Item_ref_null_helper does it, and returns FALSE if argument is NULL. With the above trick, we will register NULL value of Item even for the case of index optimization of a WHERE clause (case ?a? in the following example). The following are examples of IN transformations: ? Example 1: IN (SELECT FROM t WHERE ) If returning NULL correctly would make sense, the above will become: (SELECT 1 FROM t WHERE and (Item_ref()= or is null) HAVING Item_is_not_null_test()) When subquery is marked as the top item of the WHERE clause, it will become: (SELECT 1 FROM t WHERE and Item_ref()=) ? Example 2: IN (SELECT FROM t HAVING ORDER BY 1) will be represented as (SELECT as ref_null_helper FROM t HAVING AND Item_ref() = Item_ref_null_helper(item)) ? Example 3: IN (SELECT UNION ...) will become (SELECT 1 HAVING Item_ref()= )> UNION ...) Chapter 6: How MySQL Transforms Subqueries 37 (HAVING without FROM is a syntax error, but a HAVING condition is checked even for subquery without FROM) ? Example 4: IN (select ) will be completely replaced with = Now conditions (WHERE (a) or HAVING (b)) will be changed, depending on the select, in the following way: If subquery contains a HAVING clause, SUM() function or GROUP BY (example 1), then the item list will be unchanged and Item_ref_null_helper reference will be created on item list element. A condition will be added to the HAVING. If the subquery does not contain HAVING, SUM() function or GROUP BY (example 2), then: ? item list will be replaced with 1. ? left_expression cache> = or is null will be added to the WHERE clause and a special is_not_null(item) will be added to the HAVING, so null values will be registered. If returning NULL wouldn?t make correct sense, then only left_expression cache> = will be added to the WHERE clause. If this subquery does not contain a FROM clause or if the subquery contains UNION (example 3), then left_expression cache> = Item_null_helper() will be added to the HAVING clause. A single select without a FROM clause will be reduced to just = without use of Item_in_optimizer. 6.1.2 Row IN Subquery To rewrite a row IN subquery, the method used is Item_in_subselect::row_value_ transformer. It works in almost the same way as the scalar analog, but works with Item_cache_row for caching left expression and uses references for elements of Item_cache_row. To refer to item list it uses Item_ref_null_helper(ref_array+i). Subquery with HAVING, SUM() function, or GROUP BY will transformed in the following way: ROW(l1, l2, ... lN) IN (SELECT i1, i2, ... iN FROM t HAVING ) will become: (SELECT i1, i2, ... iN FROM t HAVING and = AND = AND ... = ) SELECT without FROM will be transformed in this way, too. It will be the same for other subqueries, except for the WHERE clause. 6.2 Item_allany_subselect Item_allany_subselect is inherited from Item_in_subselect. ALL/ANY/SOME use the same algorithm (and the same method of Item_in_subselect) as scalar IN, but use a different function instead of =. ANY/SOME use the same function that was listed after the left expression. ALL uses an inverted function, and all subqueries passed as arguments to Item_func_not_all (Item_func_not_all is a special NOT function used in optimization, see following). 38 MySQL Internals Manual for version 5.0.1-alpha. But before above transformation ability of independent ALL/ANY/SOME optimization will be checked (query is independent, operation is one of <, =<, >, >=, returning correct NULL have no sense (top level of WHERE clause) and it is not row subquery). For such queries, the following transformation can be done: val > ALL (SELECT...) -> val > MAX (SELECT...) val < ALL (SELECT...) -> val < MIN (SELECT...) val > ANY (SELECT...) -> val > MIN (SELECT...) val < ANY (SELECT...) -> val < MAX (SELECT...) val >= ALL (SELECT...) -> val >= MAX (SELECT...) val <= ALL (SELECT...) -> val <= MIN (SELECT...) val >= ANY (SELECT...) -> val >= MIN (SELECT...) val <= ANY (SELECT...) -> val <= MAX (SELECT...) ALL subqueries already have NOT before them. This problem can be solved with help of special NOT, which can bring ?top? tag to its argument and correctly process NULL if it is ?top? item (return TRUE if argument is NULL if it is ?top? item). Let?s call this operation _NOT_. Then we will have following table of transformation: val > ALL (SELECT...) -> _NOT_ val >= MAX (SELECT...) val < ALL (SELECT...) -> _NOT_ val <= MIN (SELECT...) val > ANY (SELECT...) -> val < MIN (SELECT...) val < ANY (SELECT...) -> val > MAX (SELECT...) val >= ALL (SELECT...) -> _NOT_ val > MAX (SELECT...) val <= ALL (SELECT...) -> _NOT_ val < MIN (SELECT...) val >= ANY (SELECT...) -> val <= MIN (SELECT...) val <= ANY (SELECT...) -> val >= MAX (SELECT...) If subquery does not contain grouping and aggregate function, above subquery can be rewritten with MAX()/MIN() aggregate function, for example: val > ANY (SELECT item ...) -> val < (SELECT MIN(item)...) For queries with aggregate function and/or grouping, special Item_maxmin_subselect will be used. This subquery will return maximum (minimum) value of result set. 6.3 Item_singlerow_subselect Item_singlerow_subselect will be rewritten only if it contains no FROM clause, and it is not part of UNION, and it is a scalar subquery. For now, there will be no conversion of subqueries with field or reference on top of item list (on the one hand we can?t change the name of such items, but on the other hand we should assign to it the name of the whole subquery which will be reduced); The following will not be reduced: SELECT a; SELECT 1 UNION SELECT 2; SELECT 1 FROM t1; The following select will be reduced: SELECT 1; SELECT a+2; Such a subquery will be completely replaced by its expression from item list and its SELECT_LEX and SELECT_LEX_UNIT will be removed from SELECT_LEX?s tree. But every Item_field and Item_ref of that expression will be marked for processing by a special fix_fields() procedure. The fix_fields() procedures for such Items will be performed in Chapter 6: How MySQL Transforms Subqueries 39 the same way as for items of an inner subquery. Also, if this expression is Item_fields or Item_ ref, then the name of this new item will be the same as the name of this item (but not (SELECT ...)). This is done to prevent broken references on such items from more inner subqueries.