You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A SaaS service that turns any linked data resource into an MCP endpoint your
chatbot can query.
You give it a URL or connection string. It gives you back an MCP link. You paste
that link into your MCP-capable chat client (Claude, Cursor, etc.). Your chatbot
can now traverse and reason over that resource intelligently — without you or it
The identity server is the authoritative component for entity identity across the
knowledge graph. It handles the full lifecycle of an entity's identity: resolving
a mention to an ID, promoting a provisional entity to canonical status, detecting
synonyms, and merging duplicates into a single survivor. All of this must be
correct under concurrent access from multiple worker processes and multiple server
replicas.
This document describes the ingestion pipeline for the kgraph/medlit system.
Design Principles
Dedup-on-write. The IdentityServer resolves entity identity and detects
synonyms incrementally as each entity is written. There is no global deduplication
Generalizing literature graphs across knowledge domains
Prompt:
1/18/2026, 10:16:42 AM
I want to reframe the medical literature project a bit, allow it to be generalized to other domains of knowledge. We are still building a graph and a graph still consists of nodes (entities) and edges (relationships). We still have a collection of entities from previous ingestion processes. We add a new thing: entities may be "canonical", that is they have been assigned canonical IDs (UMLS numbers or whatever) or they may be "provisional", meaning that we don't know yet if they should be assigned canonical IDs, for instance an entity might be a mention of some trivial thing in just one paper.
Given a batch of papers to ingest, we proceed in two passes. First pass we extract entities and assign canonical IDs where they make sense. Second pass we identify the edges (for medical, these edges are of the three types, extraction, claims, and evidence). The first pass produces a JSON serialization of the collection of enti
Battle Plan: Medical Knowledge Graph Ingestion Optimization
This document outlines the strategic roadmap for integrating Large Language Models (LLMs) into the ingestion pipeline, specifically targeting Stage 4 (Claims Extraction) and Stage 5 (Evidence Aggregation).
1. LLM Infrastructure Strategy
The choice of model is critical for handling the specialized technical language and structured data extraction requirements of medical research papers.
Short answer: yes, this test suite makes a lot of sense, and it’s actually quite strong architecturally. Longer answer below, with praise and some concrete suggestions where you might tighten it.
What you’re doing well
1. Clear layering and intent
Your tests map very cleanly onto your system architecture: