Skip to content

Instantly share code, notes, and snippets.

@wware
Last active March 22, 2026 14:06
Show Gist options
  • Select an option

  • Save wware/42c914e986d27f2a2b0c5f2a2dd6e91d to your computer and use it in GitHub Desktop.

Select an option

Save wware/42c914e986d27f2a2b0c5f2a2dd6e91d to your computer and use it in GitHub Desktop.

Linked Data as MCP — Product Description

What It Is

A SaaS service that turns any linked data resource into an MCP endpoint your chatbot can query.

You give it a URL or connection string. It gives you back an MCP link. You paste that link into your MCP-capable chat client (Claude, Cursor, etc.). Your chatbot can now traverse and reason over that resource intelligently — without you or it ever writing a line of SPARQL or Cypher.

The Problem It Solves

Structured knowledge graphs — DBpedia, Wikidata, UniProt, ChEMBL, internal Neo4j instances, domain-specific SPARQL endpoints — contain enormous amounts of curated, queryable knowledge. Almost none of it is accessible to typical chatbot users, because:

  • SPARQL and Cypher are hard to write correctly
  • LLM-generated graph queries are unreliable on anything non-trivial
  • Raw query results are not context-window-friendly
  • There is no standard way to say "connect my chatbot to this graph"

The gap is not the knowledge. The knowledge is there, maintained, and free. The gap is the interface.

How It Works

Step 1: Provision

You submit a URL — the address of a SPARQL endpoint or linked data resource:

https://dbpedia.org/sparql
https://query.wikidata.org/sparql
https://sparql.uniprot.org/sparql
https://your-internal-fuseki-instance/dataset

The service probes the endpoint, discovers its schema (entity types, predicate vocabulary, URI patterns), and builds a seed resolver — a search layer that maps natural-language names and aliases to canonical URIs in that graph.

Step 2: Connect

The service returns an MCP endpoint URL:

mcp.example.com/kg/dbpedia-abc123

You paste this into your chat client's MCP settings. No other configuration required.

Step 3: Query Naturally

Your chatbot now has access to three MCP tools against that resource:

search_entities(query) — find entities by name or alias, returns canonical IDs.

bfs_query(seeds, max_hops, node_filter, edge_filter) — expand a neighborhood from one or more seed entities, with control over traversal depth and which node types and edge predicates get full data vs. lightweight stubs.

describe_entity(id) — retrieve full metadata for a single entity.

The chatbot uses these tools the same way regardless of which resource is behind them. DBpedia and a private hospital knowledge base look identical from the chatbot's perspective.

The Query Model

The BFS query interface is designed for LLM consumption rather than human authorship. A typical query:

{
  "seeds": ["dbpedia:Cushing%27s_syndrome"],
  "max_hops": 2,
  "node_filter": { "entity_types": ["Drug", "Gene"] },
  "edge_filter": { "predicates": ["TREATS", "ASSOCIATED_WITH"] }
}

Key design properties:

  • Topology is always complete. Filters control the detail level of nodes and edges, not which ones appear. Non-matching nodes appear as lightweight stubs so the chatbot always sees an accurate picture of the graph's shape.
  • Context-window aware. Full provenance and metadata are returned only where requested. A two-hop neighborhood of a well-connected node does not flood the context.
  • Multi-seed. Passing multiple seeds expresses relational questions ("what do these two entities have in common?") using the same query structure.

Backend Abstraction Layer

The BFSQL server is backend-agnostic. Behind it sits a GraphDbInterface ABC that each supported data source implements. The interface is deliberately primitive — basic graph navigation only:

class GraphDbInterface(ABC):

    @abstractmethod
    def search_entities(self, query: str) -> list[EntityStub]: ...

    @abstractmethod
    def edges_from(self, entity_id: str) -> list[Edge]: ...

    @abstractmethod
    def edges_to(self, entity_id: str) -> list[Edge]: ...

    @abstractmethod
    def get_node(self, entity_id: str) -> Node: ...

    @abstractmethod
    def metadata_for_node(self, entity_id: str) -> dict[str, Any]: ...

    @abstractmethod
    def metadata_for_edge(self, edge: Edge) -> dict[str, Any]: ...

All the BFSQL-level intelligence — BFS traversal, multi-seed union, stub/full filtering, context-window management, provenance shaping — is implemented once in the BFSQL server in terms of these primitives. Backend implementors answer only one question: how do I perform basic graph navigation against this particular store?

Some Potential Backends

SparqlBackend — translates the primitive operations into SPARQL 1.1 queries against any standards-compliant endpoint. Handles prefix mapping and URI normalization. Covers DBpedia, Wikidata, UniProt, ChEMBL, Fuseki, Virtuoso, GraphDB, Stardog, Neptune, and others.

PostgresBackend — translates primitive operations into SQL against a PostgreSQL schema with pgvector. The natural backend for kgraph-derived graphs. Also supports search_entities via vector similarity search on entity embeddings.

Neo4jBackend — translates primitive operations into Cypher via the official Neo4j Python driver. edges_from and edges_to are natural graph traversals (MATCH (n)-[r]->(m)); no join logic required. search_entities relies on a Neo4j full-text index, which must exist on the target database — unlike PostgresBackend, there is no vector similarity fallback unless embeddings have been stored as node properties and a vector index configured separately. The main caveat is that Neo4j is a property graph, not an RDF store: there are no URIs, no prefix namespaces, and no guarantee of a consistent entity_type property across databases. The backend must be initialized with a mapping that identifies which node label(s) correspond to entity types and which relationship types correspond to predicates in the BFSQL model.

Adding a Backend

Any data source that can express "give me edges from this node," "give me this node's metadata," and "find me entities matching this name" can get a backend. The interface is small and has no framework dependencies. A new backend — a JSON-LD REST API, a Neo4j Cypher store, a custom in-memory graph — requires only that six methods be implemented correctly. The full BFSQL query semantics are inherited for free.

Caching

Backed resources are treated as read-only and static. This means query results can be cached aggressively without any cache invalidation logic. The BFSQL server maintains an LRU cache keyed on (backend_id, method, args). A repeated edges_from or metadata_for_node call against the same entity returns the cached result immediately, with no round-trip to the underlying store.

This has two practical benefits: latency drops significantly for any traversal that revisits nodes (common in multi-hop BFS), and load on the upstream endpoint is reduced — important for public SPARQL endpoints that rate-limit or deprioritize automated traffic.

The cache operates at the GraphDbInterface primitive level, not at the BFSQL query level, so all the upstream intelligence — BFS traversal, stub/full filtering, multi-seed union — benefits automatically.

Supported Resources

V1 — Standard SPARQL endpoints. Any endpoint speaking SPARQL 1.1 with a discoverable schema. This covers the major public linked data resources and most self-hosted triple stores (Apache Jena Fuseki, Virtuoso, GraphDB, Stardog, Amazon Neptune).

V2 — RDF dumps and file-based sources. Turtle, N-Triples, JSON-LD, and RDF/XML files accessible by URL. The service loads these into a managed store and serves them through the same interface.

Roadmap. Schema.org-annotated web pages; JSON-LD APIs; hybrid graphs that combine an external linked data resource with documents you have ingested yourself.

Who It's For

  • Researchers who want their chatbot to reason over domain knowledge graphs (biomedical, legal, materials science, geospatial) without learning SPARQL
  • Analysts at organizations that publish or consume linked open data
  • Developers building LLM applications that need grounded, citable, structured knowledge alongside generative capabilities
  • Teams that have deployed an internal triple store and want LLM access to it without building a custom integration

Pricing Model

Tier Resource types Queries/month Price
Free Public endpoints only 1,000 $0
Standard Public + file-based 20,000 $29/mo
Professional Public + private + file 200,000 $149/mo
Enterprise All + SLA + custom schema support Unlimited Contact

Private endpoints (your own Fuseki, your corporate triple store) require Standard tier or above.

Relationship to kgraph

This service is a standalone product. It does not require kgraph.

For users who also want to extract their own knowledge graphs from unstructured documents, the two compose naturally: your chatbot can hold an MCP connection to an external resource (e.g., DBpedia) and a separate MCP connection to your own kgraph-derived graph simultaneously, reasoning over both in the same conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment