The new index will be testing this indexing stratigey. The same Tokenizer and filters are run when indexing, and on user's queries.
This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters. Delimiter characters are discarded, with the following exceptions:
-
Periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names.
-
The "@" character is among the set of token-splitting punctuation, so email addresses are not preserved as single tokens.