Created
February 7, 2012 14:45
-
-
Save lgueye/1760014 to your computer and use it in GitHub Desktop.
elasticsearch : dealing with case and accents
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# delete index (will print an error if 'my_index' doesn't exist, you can safely ignore it) | |
curl -XDELETE 'http://localhost:9200/my_index' | |
# create index with its settings | |
curl -XPOST 'http://localhost:9200/my_index' -d '{ | |
"index.analysis.analyzer.default.type":"custom", | |
"index.analysis.analyzer.default.tokenizer":"standard", | |
"index.analysis.analyzer.default.filter.0":"lowercase", | |
"index.analysis.analyzer.default.filter.1":"asciifolding" | |
}' | |
# check index analyzer behaviour | |
# we can note that lowercase filter and asciifolding filters work at index phase | |
# 2 tokens are stored : 'ingenieur' and 'java' | |
curl -XGET 'localhost:9200/my_index/_analyze?text=Ingénieur+Java' | |
# add data | |
curl -XPUT 'http://localhost:9200/my_index/my_type/1' -d '{"reference":"ADV-REF-00000001", "title":"Ingénieur Java"}' | |
curl -XPUT 'http://localhost:9200/my_index/my_type/2' -d '{"reference":"ADV-REF-00000002", "title":"Conservateur documentaliste"}' | |
curl -XPUT 'http://localhost:9200/my_index/my_type/3' -d '{"reference":"ADV-REF-00000003", "title":"Technicien qualité validation H/F"}' | |
curl -XPUT 'http://localhost:9200/my_index/my_type/4' -d '{"reference":"ADV-REF-00000004", "title":"Valet de chambre"}' | |
curl -XPUT 'http://localhost:9200/my_index/my_type/5' -d '{"reference":"ADV-REF-00000005", "title":"Ingénieur PHP"}' | |
# search data | |
# the above queries should return the same results (2 hits) | |
curl http://localhost:9200/my_index/my_type/_search?q=Ingénieur* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingénieur* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingenieur* | |
curl http://localhost:9200/my_index/my_type/_search?q=Ingén* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingén* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingén* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingen* |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You can try this:
analysis-asciifolding
Or replace chars with accents with ? exmple
Find: "camión"
{ "query": { "query_string": { "analyze_wildcard": true, "query": "cami?n" } } }