Created
November 13, 2023 13:55
-
-
Save mattfield11/2136a5a7c91ba7d20889562232df0db7 to your computer and use it in GitHub Desktop.
Analyzer for SKUs in Elasticsearch
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Analyzer for SKUs | |
The analyzer below will strip out any of the items defined in the regex. In this case i am removing -#_ and any whitespace. | |
You can adapt the regex as appropriate. In this case the search WILL be case sensitive. | |
As a keyword tokenizer is used, then the text will not be split into "words" . | |
``` | |
PUT my-index-000001 | |
{ | |
"settings": { | |
"analysis": { | |
"analyzer": { | |
"my_analyzer": { | |
"tokenizer": "keyword", | |
"char_filter": [ | |
"my_char_filter" | |
] | |
} | |
}, | |
"char_filter": { | |
"my_char_filter": { | |
"type": "pattern_replace", | |
"pattern": "[#_.-]|\\s", | |
"replacement": "" | |
} | |
} | |
} | |
} | |
} | |
``` | |
You can simulate how the analyzer works by using the call below: | |
``` | |
POST my-index-000001/_analyze | |
{ | |
"analyzer": "my_analyzer", | |
"text": "Mysku123 456-789" | |
} | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment