Last active
May 14, 2022 15:09
-
-
Save gregplaysguitar/1727204 to your computer and use it in GitHub Desktop.
Django-haystack Whoosh backend with character folding
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
""" | |
Whoosh backend for haystack that implements character folding, as per | |
http://packages.python.org/Whoosh/stemming.html#character-folding . | |
Tested with Haystack 2.4.0 and Whooch 2.7.0 | |
To use, put this file on your path and add it to your haystack settings, eg. | |
HAYSTACK_CONNECTIONS = { | |
'default': { | |
'ENGINE': 'folding_whoosh_backend.FoldingWhooshEngine', | |
'PATH': 'path-to-whoosh-index', | |
}, | |
} | |
""" | |
from haystack.backends.whoosh_backend import WhooshEngine, WhooshSearchBackend | |
from whoosh.analysis import CharsetFilter, StemmingAnalyzer | |
from whoosh.support.charset import accent_map | |
from whoosh.fields import TEXT | |
class FoldingWhooshSearchBackend(WhooshSearchBackend): | |
def build_schema(self, fields): | |
schema = super(FoldingWhooshSearchBackend, self).build_schema(fields) | |
for name, field in schema[1].items(): | |
if isinstance(field, TEXT): | |
field.analyzer = StemmingAnalyzer() | CharsetFilter(accent_map) | |
return schema | |
class FoldingWhooshEngine(WhooshEngine): | |
backend = FoldingWhooshSearchBackend |
Great stuff, thanks @gregplaysguitar
if you use a EdgeNgramField, you can use this:
if isinstance(field, NGRAMWORDS):
field.analyzer = StemmingAnalyzer() | NgramFilter(minsize=X) | CharsetFilter(accent_map)
or, event better maybe, keep the original analyzer, and add the filter like this;
field.analyzer = field.analyzer | CharsetFilter(accent_map)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@paweloque, no, you should just be able to change the backend. Make sure you reindex the content after doing this.