This is a proposal / spec for a public database and API for links between related webpages, based on ideas presented in this essay. We're building a prototype at Wayfinder, and we'd like input from the community on how it should function and operate.
To build and provide an open database where anyone can easily create and query for links between related webpages. Links are used not just by curious readers but by search and recommendation engines to find and understand the relationships between content on the web. Therefore:
- Anyone should be able to link content on the web, not just authors and publishers. Today there's no easy way to publicly specify a relationship between two webpages if you're not the original author of one of those pages. This means that the power to connect information is concentrated in a small number of hands. It doesn't have to be.
- Links – all of them – should be accessible not just by those who have the compute power to build and maintain massive armies of crawlers. We need a database of links that's instantly updatable and queryable by anyone, to power the next generation of search, knowledge, and recommendation apps.
- The API should allow anyone to create a relationship between two or more pages, or retrieve links that contain a given page, keyword, etc.
- Webpage relationships (links) could be specified as a graph or as a collection.
- Authentication would not be required to query the database (perhaps up to a certain volume of requests), but would be required to add or edit links.
- We're exploring available technologies for the database, and would appreciate insight from anyone with experience in graph or other databases.
- For developers building recommendation engines, news and educational apps, and anything else that suggests related content, there is no public source of related page data that is available at scale (and off the shelf).
- Authors, bloggers, and publishers have no way of ensuring that their content is connected to other related pieces (and seen / read when that content is read). I.e. a blogger who writes a post on bitcoin will likely never have that post noticed (no matter how many links are in their article) if other posts don't proactively link to theirs. Building this open database takes us one step closer to the original bidirectional spec for hyperlinks.
To make this valuable as a queryable source of related content, the database would need to have queryable data from the get-go. Here are a few sources we could seed the DB with:
- Wikipedia
- Wikia
- Common Crawl
- Bitly (bundles and stories via the Story API)
- Wayfinder.is
This project is being led and supported by Wayfinder. We're open to ideas on how to support this project in the long-run, including setting it up with it's own 501c3 or attaching it to an existing project like Common Crawl or the Internet Archive.