"And when you search something, all the kiwix libraries conspires in helping you to retrieve it"
- Definitely Not Paulo Coelho, The Alchemist
My name is Maneesh P M. I am a senior UG in the Dept. of Chemical Engineering at IIT Kanpur. I enjoy building backend technologies, both online and offline.
My project focused on improving the search functionalities of openzim and kiwix covering both full text search and suggestion search. The major objectives were:
- Drop wrapper structures from kiwix for performance and usability enhancements
- Improve relevant suggestion results and their snippets
- Introduce a versatile suggestion API that is able to work even in the absence of a Xapian index
- Make suggestion handling more stable in kiwix-serve and other sub-projects
The projects that involved me were
I started with the project early. The first stage of the project included usability improvements to zim-tools
which is a utility that helps in local testing and command line usage.
- Fix the zim-dump crash on writing to fs openzim/zim-tools#200
- Enable the namespace filter openzim/zim-tools#171
The PRs fixing them: openzim/zim-tools PRs
The work on zim-tools
set me up for actually working on the library. I started off with some maintenance work like
- Adapting the library with new namespace scheme specifications [libzim#529, libzim#477]
- Fixing naming of zimfiles [libzim#502]
- Fixes with duplicate results [libzim#276]
Related PRs: openzim/libzim#479, openzim/libzim#503, openzim/libzim#515,
Xapian is essentially at the core of our search infrastructure. Several modifications were made to its implementation.
- Optimize the weighting schemes for suggestions [libzim#458]
- Improve phrase searching [libzim#509]
- Improve the collapsing of results [libzim#474]
- Anchor phrases to the beginning [libzim#510]
- Enhance non-word handling and db compact [libzim#536, libzim#417]
These improvements and fixes made the suggestion results much more relevant for the user.
Related PRs: openzim/libzim#492, openzim/libzim#501, openzim/libzim#520, openzim/libzim#526, openzim/libzim#534, openzim/libzim#528
With these fixes that improved "relevancy", we could now move on to user exp, that is ease of use in library, snippets, general maintenance stuffs
- Build
get_snippets
for titles search [openzim/libzim#542] - Fix ft snippet highlighting issue [openzim/libzim#86]
- Improve unit testing of search iterator [openzim/libzim#546]
Related PRs: openzim/libzim#545, openzim/libzim#559, openzim/libzim#547,
For a while, we shifted our attention to libkiwix to make some fixes based on the work in libzim.
- Improve handling of erroneous cases [kiwix/libkiwix#466, kiwix/libkiwix#496]
- Add method to find zimID from iterator [kiwix/libkiwix#107, openzim/libzim#581]
- Fix Title snippet highlighting [kiwix/libkiwix#82]
Related PRs: kiwix/libkiwix#508, kiwix/libkiwix#510, kiwix/libkiwix#505, kiwix/libkiwix#528
Since the projects depend on kiwix, any change in libkiwix has to be reflected here as well. Addition of a SuggestionItems
for iteration class was one such change.
- Fix issues with pagination in kiwix-desktop [kiwix/kiwix-desktop#617]
- Adapting kiwix-desktop to new changes [kiwix/kiwix-desktop#648]
- Adaptation PR [kiwix/kiwix-tools#460]
Related PRs: kiwix/kiwix-desktop#628, kiwix/kiwix-desktop#648, kiwix/kiwix-tools#461
Most of the issues in kiwix-tools were either moved to libkiwix/libzim or fixed via upstream patches.
All this work in itself improved the usage aspect of the library considerably 🎉
BUT the real ball game was yet to begin, work on the architecture and design! I was completely new to this area and spent a considerable amount of time to pick them up and build the huge upcoming PRs.
Dropping Wrappers from libkiwix kiwix/libkiwix#430
Essentially we were redeclaring all the libzim structures inside libkiwix, which was unnecessary and complicating. So they had to be dropped in three stages.
- Dropping wrappers from Internal Server kiwix/libkiwix#536
- Extending libkiwix structures to be build from libzim directly kiwix/libkiwix#576
- Dropping wrappers fron JNI [ytbd]
Some smaller bugs fixes encountered during this change:
- Fixing behavior of
getResults()
kiwix/libkiwix#595 - Fixing behavior of
iterator::getTitle()
openzim/libzim#586
A major back compatibility problem in libzim was, suggestions did not work in the absence of a title/ft index. One had to use Archive methods to get suggestions in this case manually. To fix this,
Add new Suggestion API to libzim openzim/libzim#564
This was undeniably the longest running PR which took 100+ discussion comments, 1500+ lines of code, about 19 commits and a lot of reviews from my mentors Matthieu and Emmanuel. The changes were
- Adding SuggestionSearcher & SuggestionSearch
- Enhancing
Archive::iterator
methods - Introducing SuggestionIterator and SuggestionResultSet
- Introducing SuggestionDataBase
- Fixing the compilation with and without Xapian dependency
There features were so interrelated that they had to be done in one go within a single PR for coherence, and therefore the PR grew in size. Post these changes, we can now use the new Suggestion API with new and old zim files alike with libzim handling the intricacies of the interactions.
This was finally merged in openzim/libzim#574.
With the new suggestions API in place, we added the additional enhancements to libkiwix as well.
- Introducing ranged suggestions kiwix/libkiwix#591
Now we can say 🎉.
Some numbers encapsulating the work in GSoC period:
What's next for the project?
- Enabling caching in kiwix-serve kiwix/libkiwix#509
- Addressing the multizim suggestion issue kiwix/libkiwix#479
Thanks to Matthieu and Emmanuel for their help and encouragement, and thanks to GSoC for providing this awesome opportunity! Kudos 🎊