eXist-db Query Optimizer and Index usage

I put together some findings I gathered over the years on how small adjustements to XPath queries can lead to better performance. These are not definitive and should be tested if they apply to specific slow running queries you have issues with. I assume you have profiled your queries before in the bundled monex application. Here is a bit of documentation on this topic to help you get started: http://exist-db.org/exist/apps/doc/indexing.xml?field=all&id=D3.19#check-usage On a typical local instance this is available at: http://localhost:8080/exist/apps/monex/profiling.html

General rules on predicates

Keeping predicates separate

[EXPR1 and EXPR2] is slower than [EXPR1][EXPR2], if either EXPR1 or EXPR2 can use an index.

Order of predicates

Changing the order of predicates can improve the performance of the query. It is usually best to order them thus, that the one reducing the node-set the most comes first.

Rewriting predicates

Lets assume you have a new range index on @b. If a[@b = 'c'] shows up in monex with "no index", rewriting it to a/@b[. = 'c']/.. might work.

Sequences of queries

The optimiser is currently not able to work on sequences of queries. Lets assume two queries QUERY1 and QUERY2 where at least one of them would make use of an index and could therefore benefit from optimization.

(QUERY1, QUERY2) is not optimised but refactoring it into separate let-assingments will work:

let $first := QUERY1
let $second := QUERY2
return ($first, $second)

You will see that the queries can be optimized now and therefore performs much better.

Queries on Unions of Collections

Here we have two collections which both have an index on a. The union of those is then queried with the same PREDICATE

($collection-a//a[PREDICATE] | $collection-b//a[PREDICATE])

Should be rewritten to

($collection-a | $collection-b)//a[PREDICATE]

This drastically improved performance in testing.

Interestingly we could observe that

($collection-a, $collection-b)//a[PREDICATE]

which produces the same results was slightly slower.

Predicates with function calls

Function calls within a predicate can prevent the optimiser to recognise index usage and therefore slowing down queries.

If we, for example, want to select all a elements, that have a child element b whose attribute c is not equal to "value"

a[b[not(@c = "value")]]

This can be rewritten to

a[b[@c ne "value"]]

Expressing the same selection and benefits from optimisation and index usage.

Full text queries on Unions and Sequences of mixed nodes

When calling to ft:query we found that the first parameter should neither be a union or sequence of mixed nodes.Finally we found a pattern that should generally be avoided while we were refactoring existing code:

ft:query(a | b, SEARCH) and even more so ft:query((a,b), SEARCH) are much slower in comparison to

(ft:query(a, SEARCH) | ft:query(b, SEARCH))

(ft:query(a, SEARCH), ft:query(b, SEARCH))

Final note

The above are the findings of many test sessions. The cycle is to

start the profiler in monex
run the original query
look at the index usage tab in monex profiler and the timing for the query under test
note down the values and reset the profiler
refactor the query
compare the previous values with the new result in monex' profiler
repeat from step 5 until satisfied

Tests can be done

in a stripped-down main-module that is evaluated in eXide
or by changing the query in-place in your application in a test environment

line-o/index-usage.md