Autocomplete with Elasticsearch

Completion Suggester

How it works

Add a suggester to your index mapping:

{
  "mappings": {
    "hotel" : {
      "properties" : {
        "name" : { "type" : "string" },
        "city" : { "type" : "string" },
        "name_suggest" : {
          "type" :     "completion"
        }
      }
    }
  }
}

Add suggestions to your document when indexing:

{
  "name" :         "Mercure Hotel Munich",
  "city" :         "Munich",
  "name_suggest" : "Mercure Hotel Munich"
}'

Will only suggest a match if the user input starts with 'm'. To provide better matches you can provide multiple suggestion values:

{
  "name" :         "Mercure Hotel Munich",
  "city" :         "Munich",
  "name_suggest" : {
    "input" :      [
      "Mercure Hotel Munich",
      "Mercure Munich"
    ]
  }
}'

Ask for suggestions:

curl -X POST localhost:9200/hotels/_suggest -d '
{
  "hotels" : {
    "text" : "m",
    "completion" : {
      "field" : "name_suggest"
    }
  }
}'

Can provide a payload when indexing to provide the ID of the document with the results so you can look up the matching document.

Can add weight to a suggestion. This would likely be useful to more heavily weight more frequently used items so they will appear higher in the suggestion results.

Pros

incredibly fast
provides a lot of control by providing multiple suggestion input matches
can maintain suggestions in a separate index so the suggestions can be easily updated if we improve the algorithm for creating the suggestion input matches or update weights as new lists are created and item use frequency increases

Cons

potentially hard to create suggestion terms in a programmatic way which will match since it is more of an autocomplete where the suggestion must match the indexed terms from left to right

NGrams

How it works

Index content as multi-field with the normal content and a content.autocomplete field that uses an ngram filter.

Multi-field indexing supports indexing a field in multiple ways so it can be searched by and indexed by different analyzers.

Edge NGram filter will onlly produce n-grams from the beginning of the word. Normal NGram produces n-grams throughout the word.

Can use NGram filter with other tokenizers to split words, lowercase, do stemming, etc before generating ngrams.

Should use min_gram of 2 and max_gram of something largish, like 15.

The ngram filter should be used on indexing for sure. It may not be appropriate to use on the search term where a standard tokenizer may be sufficient. See the jontai.me post.

Pros

will split user input by word and ngram tokenize each one so the matches do not have to be as exact as with the completion suggester
seems like the standard approach, especially before completion suggester was released

Cons

not as fast as completion suggester

Conclusion

It sounds like an NGram filter is the way to go in the beginning. As knowledge is gained we may want to try the completion suggester, but the suggester seems to require a more complex setup and understanding. There are some variables that can be tuned with the NGram filter to adjust relevancy of results.

ryanhouston/es-autocomplete.md

Autocomplete with Elasticsearch

Completion Suggester

Links

How it works

Pros

Cons

NGrams

Links

How it works

Pros

Cons

Conclusion