Skip to content

Instantly share code, notes, and snippets.

@hn3000
Last active October 19, 2024 12:59
Show Gist options
  • Save hn3000/2454379e408c68b4ac27c42aaf813d18 to your computer and use it in GitHub Desktop.
Save hn3000/2454379e408c68b4ac27c42aaf813d18 to your computer and use it in GitHub Desktop.
Search - Federated, personalised

Federate search

What I'd like

  • self-hosted search engine
  • meta search, checking several APIs, e.g. bing, google (are there others?)
  • search through an index of my own web browsing history (I may be looking for sth I read last week)
  • search using search API offered by sites, eg Wikipedia, github, archive.org, media.ccc.de, peertube, youtube even (do they even offer APIs? They totally should!)
  • search documentation sites I visited before (so not just the page I read, index the whole thing or use their search)

as much of that in local index, have that be very efficient (in power, storage is probably not an issue)

links

just some stuff that came to mind, no order, no implied usefulness

Martin Hamilton's lightning talk

The talk is here: https://media.ccc.de/v/37c3-lightningtalks-58060-honey-i-federated-the-search-engine-finding-stuff-online-post-big-tech?#

Notes (taken from his slides):

The past

  • we used to have catalogs of stuff, e.g. Yahoo
  • Syndication (Meta Content Framework, RDF Site Summary, Really Simple Syndication)
  • Parallel Search (WAIS), gathering, indexing, brokering (e.g. Harvest)

Re-imaging the present

  • List of generally trustworthy sites (e.g. Wikipedia, it's better than is generally credited)
  • Auto-discovered RSS + XML Sitemap feeds (* Hello Fediverse)
  • Personalised search (ElasticSearch, Apache Solr etc)
  • Parallel search (e.g. SearXNG, LibreX) of noteworthy sites (via regular search engines) + personalised

So basically:

  • setup self-hosted SearXNG
  • and stuff it with all the search modules I want,
  • also index stuff that is of special interest to me

Combine with local archival?

Search history quickly loses its relevance if the sites that are found go away.

So maybe it needs to be combined with archivebox?

https://docs.sweeting.me/s/archivebox-plugin-ecosystem-announcement#

Maybe the archive could be kept in a form like https://alexwlchan.net/2024/static-websites/ .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment