This demonstrates the implementation of full text search for documents in Indexed DB.
- Word-breaking and stemming is used to create a list of terms for each document.
- Document records are annotated with the list of terms when added to the database.
- A multi-entry index on the list of terms is populated.
- A query is similarly processed into a list of terms.
- A join over the terms is implemented using multiple cursors on the index.
The necessity of annotating records with the word list to populate the index is a limitation of the current Indexed DB API. A feature request to support custom indexing is tracked at w3c/IndexedDB#33.
This is just a demonstration and not production-quality code. The segmenter is a new API and not available in all browsers. It may be out of sync with this demo and therefore broken. The stemmer code is unoptimized and definitely too slow for serious use. Sorry about that.
Intl.Segmenter
- https://github.com/tc39/proposal-intl-segmenter (Chrome 87 or later)
- Porter Stemming Algorithm by Martin Porter
- https://tartarus.org/martin/PorterStemmer/ - algorithm
- https://tartarus.org/martin/PorterStemmer/js.txt - JS implementation
Note that this stemmer is no longer recommended by the author for practical work, but used as it's something everyone has heard of.
Drop this in as porter-stemmer.js
FullText.tokenize(text, locale)
Tokenize a string into word stems, for creating full text index.
- text: string to tokenize
- locale: locale for tokenizing (e.g.
'en'
)
Returns array of word-stems.
FullText.search(index, query, locale, callback)
Perform a full-text search.
- index: an IDBIndex mapping word-stems to records
- query: text string, e.g.
'alice bob eve'
- locale: locale for tokenizing query (e.g.
'en'
) - callback: called with array of primary keys
Must be called when the index's transaction is active. Callback will be called when the transaction is active (i.e. more requests can be made within the transaction).
Throws if query contains no words.