-
-
Save lgueye/1760014 to your computer and use it in GitHub Desktop.
# delete index (will print an error if 'my_index' doesn't exist, you can safely ignore it) | |
curl -XDELETE 'http://localhost:9200/my_index' | |
# create index with its settings | |
curl -XPOST 'http://localhost:9200/my_index' -d '{ | |
"index.analysis.analyzer.default.type":"custom", | |
"index.analysis.analyzer.default.tokenizer":"standard", | |
"index.analysis.analyzer.default.filter.0":"lowercase", | |
"index.analysis.analyzer.default.filter.1":"asciifolding" | |
}' | |
# check index analyzer behaviour | |
# we can note that lowercase filter and asciifolding filters work at index phase | |
# 2 tokens are stored : 'ingenieur' and 'java' | |
curl -XGET 'localhost:9200/my_index/_analyze?text=Ingénieur+Java' | |
# add data | |
curl -XPUT 'http://localhost:9200/my_index/my_type/1' -d '{"reference":"ADV-REF-00000001", "title":"Ingénieur Java"}' | |
curl -XPUT 'http://localhost:9200/my_index/my_type/2' -d '{"reference":"ADV-REF-00000002", "title":"Conservateur documentaliste"}' | |
curl -XPUT 'http://localhost:9200/my_index/my_type/3' -d '{"reference":"ADV-REF-00000003", "title":"Technicien qualité validation H/F"}' | |
curl -XPUT 'http://localhost:9200/my_index/my_type/4' -d '{"reference":"ADV-REF-00000004", "title":"Valet de chambre"}' | |
curl -XPUT 'http://localhost:9200/my_index/my_type/5' -d '{"reference":"ADV-REF-00000005", "title":"Ingénieur PHP"}' | |
# search data | |
# the above queries should return the same results (2 hits) | |
curl http://localhost:9200/my_index/my_type/_search?q=Ingénieur* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingénieur* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingenieur* | |
curl http://localhost:9200/my_index/my_type/_search?q=Ingén* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingén* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingén* | |
curl http://localhost:9200/my_index/my_type/_search?q=ingen* |
Hi,
Yes, the key is accents encoding. Instead of "curl http://localhost:9200/my_index/my_type/_search?q=Ingén_" use "curl http://localhost:9200/my_index/my_type/_search?q=Ing%C3A9n_"
Cheers
The problem is the oposite, i need also to get Ingen ... tried like this:
curl -XGET 'http://172.16.181.128:9200/sandbox/tests/_search' -d '{
"query" : {
"text" : {
"user" : {
"query" : "ingen",
"type" : "boolean",
"operator" : "AND",
"fuzziness" : "0.5"
}
}
}
}'
AND IT WORKS but because the aproximation if i have too many differences between the words than it will not work... so this does not solve all the accent problem.. do someone know how to simply index by IGNORING accents?
You can try this:
Or replace chars with accents with ? exmple
Find: "camión"
{ "query": { "query_string": { "analyze_wildcard": true, "query": "cami?n" } } }
Did you manage to sort this problem? I'm facing the same... no hits after all.