I put together some findings I gathered over the years on how small adjustements to XPath queries can lead to better performance. These are not definitive and should be tested if they apply to specific slow running queries you have issues with. I assume you have profiled your queries before in the bundled monex application. Here is a bit of documentation on this topic to help you get started: http://exist-db.org/exist/apps/doc/indexing.xml?field=all&id=D3.19#check-usage On a typical local instance this is available at: http://localhost:8080/exist/apps/monex/profiling.html
[EXPR1 and EXPR2]
is slower than [EXPR1][EXPR2]
, if either EXPR1
or EXPR2
can use an index.
Changing the order of predicates can improve the performance of the query. It is usually best to order them thus, that the one reducing the node-set the most comes first.
Lets assume you have a new range index on @b
.
If a[@b = 'c']
shows up in monex with "no index", rewriting it to a/@b[. = 'c']/..
might work.
The optimiser is currently not able to work on sequences of queries. Lets assume two queries QUERY1
and QUERY2
where at least
one of them would make use of an index and could therefore benefit from optimization.
(QUERY1, QUERY2)
is not optimised but refactoring it into separate let-assingments will work:
let $first := QUERY1
let $second := QUERY2
return ($first, $second)
You will see that the queries can be optimized now and therefore performs much better.
Here we have two collections which both have an index on a
. The union of those is then queried with the same PREDICATE
($collection-a//a[PREDICATE] | $collection-b//a[PREDICATE])
Should be rewritten to
($collection-a | $collection-b)//a[PREDICATE]
This drastically improved performance in testing.
Interestingly we could observe that
($collection-a, $collection-b)//a[PREDICATE]
which produces the same results was slightly slower.
Function calls within a predicate can prevent the optimiser to recognise index usage and therefore slowing down queries.
If we, for example, want to select all a
elements, that have a child element b
whose attribute c
is not equal to "value"
a[b[not(@c = "value")]]
This can be rewritten to
a[b[@c ne "value"]]
Expressing the same selection and benefits from optimisation and index usage.
When calling to ft:query
we found that the first parameter should neither be a union or sequence of mixed nodes.Finally we found a pattern that should generally be avoided while we were refactoring existing code:
ft:query(a | b, SEARCH)
and even more so ft:query((a,b), SEARCH)
are much slower in comparison to
(ft:query(a, SEARCH) | ft:query(b, SEARCH))
or
(ft:query(a, SEARCH), ft:query(b, SEARCH))
The above are the findings of many test sessions. The cycle is to
- start the profiler in monex
- run the original query
- look at the index usage tab in monex profiler and the timing for the query under test
- note down the values and reset the profiler
- refactor the query
- compare the previous values with the new result in monex' profiler
- repeat from step 5 until satisfied
Tests can be done
- in a stripped-down main-module that is evaluated in eXide
- or by changing the query in-place in your application in a test environment