Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save den-crane/ee3d4ed06d547f58fd8ea728c25c42f9 to your computer and use it in GitHub Desktop.
Save den-crane/ee3d4ed06d547f58fd8ea728c25c42f9 to your computer and use it in GitHub Desktop.
Clickhouse index leverage for monotonic functions

Let me explain how ClickHouse can utilize an index in this case, despite the expression intDiv(X, 10) = ? appearing non-sargable at first glance.

The key lies in ClickHouse's ability to perform monotonic function analysis. Here's how it works:

First, let's understand what makes intDiv(X, 10) special. The integer division function is monotonic - as X increases, intDiv(X, 10) either increases or stays the same, never decreases. For example:

X = 45 -> intDiv(X, 10) = 4
X = 46 -> intDiv(X, 10) = 4
X = 50 -> intDiv(X, 10) = 5

This capability extends to other monotonic functions as well, not just intDiv. It's part of ClickHouse's broader strategy of function analysis for query optimization.

There's a more general approach that ClickHouse likely uses:

For a monotonic function f(X) and a condition f(X) = const, ClickHouse can:

  1. Use the index to get a series of X values
  2. Apply f(X) to these values
  3. Use monotonicity property to find the boundaries where f(X) = const

So for intDiv(X, 10) = 4:

  1. Try some X values from index
  2. Once it finds where intDiv(X, 10) changes from 3 to 4 (at X=40) and from 4 to 5 (at X=50)
  3. These become the range boundaries

This is more practical because:

  • No need to understand function semantics
  • Works for any monotonic function
  • Can use binary search due to monotonicity
  • Index helps find initial values efficiently

Ah yes! Let me fix the example with values that would actually give us intDiv = 4:

Index positions: 0,     8192,   16384,  24576,  32768,  40960,  49152, ...
Values at index: 5,     25,     35,     39,     41,     45,     51, ...

intDiv(values, 10):
0,     2,      3,      3,      4,      4,      5, ...

Now it's correct for the case intDiv(X, 10) = 4, because:

  • 41/10 = 4 (integer division)
  • 45/10 = 4
  • 51/10 = 5

ClickHouse binary searches these sparse index values to find where the function result transitions from 3 to 4 and from 4 to 5, then uses those marks to determine which ranges of data need to be read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment