Last active
May 14, 2022 15:09
-
-
Save gregplaysguitar/1727204 to your computer and use it in GitHub Desktop.
Django-haystack Whoosh backend with character folding
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
""" | |
Whoosh backend for haystack that implements character folding, as per | |
http://packages.python.org/Whoosh/stemming.html#character-folding . | |
Tested with Haystack 2.4.0 and Whooch 2.7.0 | |
To use, put this file on your path and add it to your haystack settings, eg. | |
HAYSTACK_CONNECTIONS = { | |
'default': { | |
'ENGINE': 'folding_whoosh_backend.FoldingWhooshEngine', | |
'PATH': 'path-to-whoosh-index', | |
}, | |
} | |
""" | |
from haystack.backends.whoosh_backend import WhooshEngine, WhooshSearchBackend | |
from whoosh.analysis import CharsetFilter, StemmingAnalyzer | |
from whoosh.support.charset import accent_map | |
from whoosh.fields import TEXT | |
class FoldingWhooshSearchBackend(WhooshSearchBackend): | |
def build_schema(self, fields): | |
schema = super(FoldingWhooshSearchBackend, self).build_schema(fields) | |
for name, field in schema[1].items(): | |
if isinstance(field, TEXT): | |
field.analyzer = StemmingAnalyzer() | CharsetFilter(accent_map) | |
return schema | |
class FoldingWhooshEngine(WhooshEngine): | |
backend = FoldingWhooshSearchBackend |
@paweloque, no, you should just be able to change the backend. Make sure you reindex the content after doing this.
Great stuff, thanks @gregplaysguitar
if you use a EdgeNgramField, you can use this:
if isinstance(field, NGRAMWORDS):
field.analyzer = StemmingAnalyzer() | NgramFilter(minsize=X) | CharsetFilter(accent_map)
or, event better maybe, keep the original analyzer, and add the filter like this;
field.analyzer = field.analyzer | CharsetFilter(accent_map)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I still cannot search using the words without accents like:
search with 'cafe' and get back results like: 'café', 'cafe'.
Do I have to do something additional like changing the index template?