The DecorrelatePredicateSubquery
rule in Apache DataFusion is responsible for rewriting correlated subqueries in WHERE
/HAVING
clauses (specifically IN
and EXISTS
predicates, including their negations) into semijoin or antijoin operations. This transforms a nested query into a flat join-based plan for execution. To achieve this, the rule employs a carefully orchestrated recursion strategy that handles subqueries within subqueries (nested subqueries) and coordinates with DataFusion’s optimizer driver to avoid duplicate traversals.
Top-Down Invocation: The rule is registered to run in a top-down manner. In its implementation of OptimizerRule
, it overrides apply_order
to return Some(ApplyOrder::TopDown)
([decorrelate_predicate_subquery.rs - source](https://docs.rs/datafusion-optimizer/46.0.1/x86_64-unknown-linux-gnu/src/datafusion_optimizer/decorrelate_predicate_