PlanMapper helps Hive regenerate better query plans using runtime stats. It groups entities which are semantically the same. For example, A RelNode of Calcite to express WHERE id = 1
could be equivalent with a FilterOperator of Hive. A CommonMergeJoinOperator could be linked to a MapJoinOperator converted from the CommonMergeOperator.
Groups generated by PlanMapper express such relationship so that it can propagate the final runtime stats to RelNodes or Operators in each step. https://cwiki.apache.org/confluence/display/Hive/Query+ReExecution
The following invocations happen in SemanticAnalyzer#analyzeInternal
.
- ASTConverter links ASTNodes with RelNodes
- For table scans, ASTNode <-> RelNode
- For predicates, ASTNode <-> RelNode <-> RelTreeSignature
- SemanticAnalyzer links ASTNodes with Operators
- For predicates, ASTNode <-> Operator
- AuxOpTreeSignature MERGEs all Operators with its signature(aux sig)
- StatsRuleProcFactory links Operators with their signatures to share runtime stats between Operators
- ConvertJoinMapJoin links original join Operators with optimized join Operators
- AuxOpTreeSignature MERGEs all Operators with its signature(aux sig)
- Vectorizer links non vectorized Operators with vectorized Operators
They are grouped when any of the following conditions are satisfied.
- Any instance is shared based on
==
- Any OpTreeSignature is shared based on
equals
- ASTConverter links an ASTNode with a RelNode => ASTNode <-> RelNode
- I guess we should link the ASTNode with an Operator
- StatsRuleProcFactory links an Operator with OpTreeSignature => Operator <-> OpTreeSignature
- Finally, we expect two groups exist
- ASTNode <-> RelNode
- Operator <-> OpTreeSignature
- ASTConverter links an ASTNode with a RelNode => ASTNode <-> RelNode
- ASTCionverter links an ASTNode with a rel signature => ASTNode <-> RelNode <-> RelTreeSignature
- SemanticAnalyzer links an AST with an Operator => ASTNode <-> RelNode <-> RelTreeSignature <-> Operator
- TableScanPPD replaces the Operator with a new Operator
- StatsRuleProcFactory links an Operator with its signature => replaced Operator <-> OpTreeSignature
- AuxOpTreeSignature MERGEs operators => ASTNode <-> RelNode <-> RelTreeSignature <-> Operator <-> replaced Operator <-> OpTreeSignature
- Vectorizer links an Operator with a vectorized Operator => ASTNode <-> RelNode <-> RelTreeSignature <-> Operator <-> replaced Operator <-> OpTreeSignature <-> vectorized Operator
- Finally, we expect one group exists
- ASTNode <-> RelNode <-> RelTreeSignature <-> Operator <-> replaced Operator <-> OpTreeSignature <-> vectorized Operator
- StatsRuleProcFactory links an Operator with its signature => Operator <-> OpTreeSignature
- ConvertJoinMapJoin links an Operator with an optimized Operator => Operator <-> OpTreeSignature <-> optimized Operator
- Vectorizer links an Operator with a vectorized Operator => Operator <-> OpTreeSignature <-> optimized Operator <-> vectorized Operator
- Finally, we expect one group exists
- Operator <-> OpTreeSignature <-> optimized Operator <-> vectorized Operator
- StatsRuleProcFactory links an Operator with its signature => Operator <-> OpTreeSignature
- Vectorizer links an Operator with a vectorized Operator => Operator <-> OpTreeSignature <-> vectorized Operator
- Finally, we expect one group exists
- Operator <-> OpTreeSignature <-> vectorized Operator
Operators with the same shape are unified by StatsRuleProcFactory or AuxOpTreeSignature. For example, in the following case, TableScanOperators, FilterOperators, and SelectOperators of are unified as those have the same signatures.
EXPLAIN CBO
SELECT a.key, a.value, b.key, b.value
FROM src a
JOIN src b ON a.key = b.key
WHERE a.key != '1' AND b.key != '1'
ORDER BY a.key;