Let me explain how ClickHouse can utilize an index in this case, despite the expression intDiv(X, 10) = ?
appearing non-sargable at first glance.
The key lies in ClickHouse's ability to perform monotonic function analysis. Here's how it works:
First, let's understand what makes intDiv(X, 10)
special. The integer division function is monotonic - as X increases, intDiv(X, 10)
either increases or stays the same, never decreases. For example:
X = 45 -> intDiv(X, 10) = 4
X = 46 -> intDiv(X, 10) = 4
X = 50 -> intDiv(X, 10) = 5
This capability extends to other monotonic functions as well, not just intDiv
. It's part of ClickHouse's broader strategy of function analysis for query optimization.
There's a more general approach that ClickHouse likely uses:
For a monotonic function f(X)
and a condition f(X) = const
, ClickHouse can:
- Use the index to get a series of X values
- Apply f(X) to these values
- Use monotonicity property to find the boundaries where f(X) = const
So for intDiv(X, 10) = 4
:
- Try some X values from index
- Once it finds where intDiv(X, 10) changes from 3 to 4 (at X=40) and from 4 to 5 (at X=50)
- These become the range boundaries
This is more practical because:
- No need to understand function semantics
- Works for any monotonic function
- Can use binary search due to monotonicity
- Index helps find initial values efficiently
Ah yes! Let me fix the example with values that would actually give us intDiv = 4:
Index positions: 0, 8192, 16384, 24576, 32768, 40960, 49152, ...
Values at index: 5, 25, 35, 39, 41, 45, 51, ...
intDiv(values, 10):
0, 2, 3, 3, 4, 4, 5, ...
Now it's correct for the case intDiv(X, 10) = 4
, because:
- 41/10 = 4 (integer division)
- 45/10 = 4
- 51/10 = 5
ClickHouse binary searches these sparse index values to find where the function result transitions from 3 to 4 and from 4 to 5, then uses those marks to determine which ranges of data need to be read.