Add a suggester to your index mapping:
{
"mappings": {
"hotel" : {
"properties" : {
"name" : { "type" : "string" },
"city" : { "type" : "string" },
"name_suggest" : {
"type" : "completion"
}
}
}
}
}
Add suggestions to your document when indexing:
{
"name" : "Mercure Hotel Munich",
"city" : "Munich",
"name_suggest" : "Mercure Hotel Munich"
}'
Will only suggest a match if the user input starts with 'm'. To provide better matches you can provide multiple suggestion values:
{
"name" : "Mercure Hotel Munich",
"city" : "Munich",
"name_suggest" : {
"input" : [
"Mercure Hotel Munich",
"Mercure Munich"
]
}
}'
Ask for suggestions:
curl -X POST localhost:9200/hotels/_suggest -d '
{
"hotels" : {
"text" : "m",
"completion" : {
"field" : "name_suggest"
}
}
}'
Can provide a payload
when indexing to provide the ID of the document with
the results so you can look up the matching document.
Can add weight
to a suggestion. This would likely be useful to more heavily
weight more frequently used items so they will appear higher in the suggestion
results.
- incredibly fast
- provides a lot of control by providing multiple suggestion input matches
- can maintain suggestions in a separate index so the suggestions can be easily updated if we improve the algorithm for creating the suggestion input matches or update weights as new lists are created and item use frequency increases
- potentially hard to create suggestion terms in a programmatic way which will match since it is more of an autocomplete where the suggestion must match the indexed terms from left to right
Index content as multi-field with the normal content and a content.autocomplete field that uses an ngram filter.
Multi-field indexing supports indexing a field in multiple ways so it can be searched by and indexed by different analyzers.
Edge NGram filter will onlly produce n-grams from the beginning of the word. Normal NGram produces n-grams throughout the word.
Can use NGram filter with other tokenizers to split words, lowercase, do stemming, etc before generating ngrams.
Should use min_gram
of 2 and max_gram
of something largish, like 15.
The ngram filter should be used on indexing for sure. It may not be appropriate to use on the search term where a standard tokenizer may be sufficient. See the jontai.me post.
- will split user input by word and ngram tokenize each one so the matches do not have to be as exact as with the completion suggester
- seems like the standard approach, especially before completion suggester was released
- not as fast as completion suggester
It sounds like an NGram filter is the way to go in the beginning. As knowledge is gained we may want to try the completion suggester, but the suggester seems to require a more complex setup and understanding. There are some variables that can be tuned with the NGram filter to adjust relevancy of results.