I set up this small Gist to demonstrate a possible bug in Solr's Re-ranking / LTR / QueryComponent.
When combining Re-Ranking and Sorting in a query in a Solr Cloud environment on a collection with multiple shards, the result is sorted randomly.
This will launch a Zookeeper node and two Solr nodes
$ docker-compose up -d
We use the films example data set for this. We create the two sharded films collection and load the example data:
$ curl --user solr:solr \
"http://localhost:8983/solr/admin/collections?action=CREATE&name=films&numShards=2&collection.configName=_default"
$ curl -X POST -H 'Content-type:application/json' \
--data-binary '{"add-field": {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' \
"http://localhost:8983/solr/films/schema"
$ curl -X POST -H 'Content-type:application/json' \
--data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' \
"http://localhost:8983/solr/films/schema"
$ docker exec -it solr_1 sh -c \
"java -jar -Dc=films -Dauto /opt/solr/example/exampledocs/post.jar /opt/solr/example/films/*.json"
The links below link to the running Docker ensemble on localhost:8983
-
Query animation films: /select?q=animation
-
Query animation films and boost fantasy films: /select?q=animation&rq={!rerank reRankQuery=$rqq reRankDocs=10 reRankWeight=1000}&rqq=fantasy
π This works like a charm, fantasy films are boosted by a factor of 1000 (re-ranked docs have a score > 1000)
- Sort films by id descending but retrieve results from shard1 only: /select?q=animation&rq={!rerank reRankQuery=$rqq reRankDocs=10 reRankWeight=1000}&rqq=fantasy&sort=id asc&shards=shard1
π The result set is in the first pass sorted by id. The first 10 documents are re-ranked according to their matches of fantasy.
- Now expand the query above on to the other shard: /select?q=animation&rq={!rerank reRankQuery=$rqq reRankDocs=10 reRankWeight=1000}&rqq=fantasy&sort=id asc
π₯ The result set seems to be randomly sorted