This file is a reference for Claude while assisting with the writing and technical development of a book. It defines the formal model precisely, establishes vocabulary, states hard rules, and lists explicit non-goals. When in doubt, check against this file before generating code, prose, or schema definitions.
I could think of worse ideas than packaging the first book with the identity-server repo and the third book with the bfs-ql repo, and then the second book would go with the kgraph repo, and all of them would have a shared git submodule for all the book creation stuff, subject to customizations for each thing.
Let's consider a block diagram like this. Each of these is a microservice, which makes some assumptions about what other components are available. Each service is scalable as required.
- Identity server - move it from TestPyPi to PyPi, possibly include the first book, otherwise little or no change.
- BFS-QL - same thing (except third book).
- Kgraph (the ingestion pipeline) gets similar treatment with the second book, also goes to PyPi, and it gets a default Domain thing that it can run with. Then domains are things that look like the example Domain thing. Again, PyPi. This is the ingestion pipeline, initially domain agnostic. Maybe use [that markitdown thing from
Link: https://claude.ai/chat/ced2f44c-eace-4c0f-9a49-4df7c7a42b10
4/12/2026, 10:39:28 AM
It occurs to me that the graph I'm building for Graphwright is specifically a typed graph not just a generic knowledge graph, in the sense that there are a finite number of allowed entity types and a finite number of allowed predicates. This allows for error checking and constructing the graph because we can say we can specify a domain and a range for each predicate in terms of entity types. It may also permit some optimizations in SQL constructs, I haven't thought that through yet.
A SaaS service that turns any linked data resource into an MCP endpoint your chatbot can query.
You give it a URL or connection string. It gives you back an MCP link. You paste that link into your MCP-capable chat client (Claude, Cursor, etc.). Your chatbot can now traverse and reason over that resource intelligently — without you or it
The identity server is the authoritative component for entity identity across the knowledge graph. It handles the full lifecycle of an entity's identity: resolving a mention to an ID, promoting a provisional entity to canonical status, detecting synonyms, and merging duplicates into a single survivor. All of this must be correct under concurrent access from multiple worker processes and multiple server replicas.
Created: 2/7/2026 1:50:45
Updated: 2/7/2026 9:13:50
Exported: 2/7/2026 9:46:03
Link: https://claude.ai/chat/8c15c90c-1838-4db7-bc11-62f415c5a421
2/7/2026, 1:50:46 AM
Please review recent work in https://github.com/wware/kgraph, in particular https://github.com/wware/kgraph/commit/59768ceb515e2c3981127623ad1eae04aab925a0 Please let me know if this looks correct to you.
Below is a focused review: what looks correct, what looks especially good, and a small number of actionable cautions.
Created: 1/18/2026 9:17:52
Updated: 1/23/2026 8:36:32
Exported: 1/23/2026 9:50:05
Link: https://claude.ai/chat/866d9e7b-e0eb-4606-a58c-2f7704ad6c8a
1/18/2026, 9:17:53 AM
1/18/2026, 10:16:42 AM
I want to reframe the medical literature project a bit, allow it to be generalized to other domains of knowledge. We are still building a graph and a graph still consists of nodes (entities) and edges (relationships). We still have a collection of entities from previous ingestion processes. We add a new thing: entities may be "canonical", that is they have been assigned canonical IDs (UMLS numbers or whatever) or they may be "provisional", meaning that we don't know yet if they should be assigned canonical IDs, for instance an entity might be a mention of some trivial thing in just one paper.
Given a batch of papers to ingest, we proceed in two passes. First pass we extract entities and assign canonical IDs where they make sense. Second pass we identify the edges (for medical, these edges are of the three types, extraction, claims, and evidence). The first pass produces a JSON serialization of the collection of enti