Last active
October 10, 2018 06:54
-
-
Save lukas-vlcek/5846745 to your computer and use it in GitHub Desktop.
Ukázka ICU Folding.
Předpokládá Elasticsearch 0.90.0 a nainstalovaný ICU plugin 1.9.0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
curl -X DELETE 'localhost:9200/i/' | |
curl -X POST 'localhost:9200/i/' -d '{ | |
"settings" : { | |
"number_of_shards" : 1, | |
"number_of_replicas" : 0, | |
"analysis" : { | |
"analyzer" : { | |
"icu_folding" : { | |
"type" : "custom", | |
"tokenizer" : "whitespace", | |
"filter" : ["icu_folding"] | |
}, | |
"ascii_folding" : { | |
"type" : "custom", | |
"tokenizer" : "whitespace", | |
"filter" : ["asciifolding","lowercase"] | |
} | |
} | |
} | |
} | |
}' | |
# exit; and test manually... | |
# ascii folding and icu folding work the same way (except the lowercasing which has to be added into ascii_filding) | |
curl 'localhost:9200/i/_analyze?analyzer=icu_folding&pretty=true' -d 'Běloučký kůň úpěl ódy!' | |
curl 'localhost:9200/i/_analyze?analyzer=ascii_folding&pretty=true' -d 'Běloučký kůň úpěl ódy!' | |
# Ascii folding works in some cases... | |
curl 'localhost:9200/i/_analyze?analyzer=icu_folding&pretty=true' -d 'dž ¼ № ℃ ™ Æ Ȣ ffi ' | |
curl 'localhost:9200/i/_analyze?analyzer=ascii_folding&pretty=true' -d 'dž ¼ № ℃ ™ Æ Ȣ ffi ' | |
# Ascii folding noop here... ICU folding rocks! | |
curl 'localhost:9200/i/_analyze?analyzer=icu_folding&pretty=true' -d 'º o ª a ℹ i ℇ e' | |
curl 'localhost:9200/i/_analyze?analyzer=ascii_folding&pretty=true' -d 'º o ª a ℹ i ℇ e' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
for those who are lazy or simply cant try this right away, the differences are:
icu_folding: dž -> dz, ¼ -> 1/4, № -> no ...
ascii_folding: dž -> dz, ¼ -> ¼, № -> № ...
icu_folding: º -> o, o -> o, ª -> a, a -> a, ℹ -> i, i -> i, ℇ -> e, e -> e
ascii_folding: º -> º, o -> o, ª -> ª, a -> a, ℹ -> ℹ, i -> i, ℇ -> ℇ, e -> e