- self-hosted search engine
- meta search, checking several APIs, e.g. bing, google (are there others?)
- search through an index of my own web browsing history (I may be looking for sth I read last week)
- search using search API offered by sites, eg Wikipedia, github, archive.org, media.ccc.de, peertube, youtube even (do they even offer APIs? They totally should!)
- search documentation sites I visited before (so not just the page I read, index the whole thing or use their search)
as much of that in local index, have that be very efficient (in power, storage is probably not an issue)
just some stuff that came to mind, no order, no implied usefulness
- https://github.com/searxng/searxng
- https://github.com/searxng/searxng-docker
- https://github.com/searxng/searx-instances-uptime
- https://index.commoncrawl.org/
- https://github.com/dstl/Open-Federated-Search
The talk is here: https://media.ccc.de/v/37c3-lightningtalks-58060-honey-i-federated-the-search-engine-finding-stuff-online-post-big-tech?#
Notes (taken from his slides):
The past
- we used to have catalogs of stuff, e.g. Yahoo
- Syndication (Meta Content Framework, RDF Site Summary, Really Simple Syndication)
- Parallel Search (WAIS), gathering, indexing, brokering (e.g. Harvest)
Re-imaging the present
- List of generally trustworthy sites (e.g. Wikipedia, it's better than is generally credited)
- Auto-discovered RSS + XML Sitemap feeds (* Hello Fediverse)
- Personalised search (ElasticSearch, Apache Solr etc)
- Parallel search (e.g. SearXNG, LibreX) of noteworthy sites (via regular search engines) + personalised
So basically:
- setup self-hosted SearXNG
- and stuff it with all the search modules I want,
- also index stuff that is of special interest to me
Search history quickly loses its relevance if the sites that are found go away.
So maybe it needs to be combined with archivebox?
https://docs.sweeting.me/s/archivebox-plugin-ecosystem-announcement#
Maybe the archive could be kept in a form like https://alexwlchan.net/2024/static-websites/ .