Skip to content

Instantly share code, notes, and snippets.

@lgueye
Created February 7, 2012 14:45
Show Gist options
  • Save lgueye/1760014 to your computer and use it in GitHub Desktop.
Save lgueye/1760014 to your computer and use it in GitHub Desktop.
elasticsearch : dealing with case and accents
# delete index (will print an error if 'my_index' doesn't exist, you can safely ignore it)
curl -XDELETE 'http://localhost:9200/my_index'
# create index with its settings
curl -XPOST 'http://localhost:9200/my_index' -d '{
"index.analysis.analyzer.default.type":"custom",
"index.analysis.analyzer.default.tokenizer":"standard",
"index.analysis.analyzer.default.filter.0":"lowercase",
"index.analysis.analyzer.default.filter.1":"asciifolding"
}'
# check index analyzer behaviour
# we can note that lowercase filter and asciifolding filters work at index phase
# 2 tokens are stored : 'ingenieur' and 'java'
curl -XGET 'localhost:9200/my_index/_analyze?text=Ingénieur+Java'
# add data
curl -XPUT 'http://localhost:9200/my_index/my_type/1' -d '{"reference":"ADV-REF-00000001", "title":"Ingénieur Java"}'
curl -XPUT 'http://localhost:9200/my_index/my_type/2' -d '{"reference":"ADV-REF-00000002", "title":"Conservateur documentaliste"}'
curl -XPUT 'http://localhost:9200/my_index/my_type/3' -d '{"reference":"ADV-REF-00000003", "title":"Technicien qualité validation H/F"}'
curl -XPUT 'http://localhost:9200/my_index/my_type/4' -d '{"reference":"ADV-REF-00000004", "title":"Valet de chambre"}'
curl -XPUT 'http://localhost:9200/my_index/my_type/5' -d '{"reference":"ADV-REF-00000005", "title":"Ingénieur PHP"}'
# search data
# the above queries should return the same results (2 hits)
curl http://localhost:9200/my_index/my_type/_search?q=Ingénieur*
curl http://localhost:9200/my_index/my_type/_search?q=ingénieur*
curl http://localhost:9200/my_index/my_type/_search?q=ingenieur*
curl http://localhost:9200/my_index/my_type/_search?q=Ingén*
curl http://localhost:9200/my_index/my_type/_search?q=ingén*
curl http://localhost:9200/my_index/my_type/_search?q=ingén*
curl http://localhost:9200/my_index/my_type/_search?q=ingen*
@alexol91
Copy link

You can try this:

analysis-asciifolding

Or replace chars with accents with ? exmple

Find: "camión"
{ "query": { "query_string": { "analyze_wildcard": true, "query": "cami?n" } } }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment