Skip to content

Instantly share code, notes, and snippets.

@wware
wware / 1_CLAUDE.md
Last active May 4, 2026 16:42
Thinking about building a typed graph for a Sherlock Holmes story

CLAUDE.md — The Typed Graph

This file is a reference for Claude while assisting with the writing and technical development of a book. It defines the formal model precisely, establishes vocabulary, states hard rules, and lists explicit non-goals. When in doubt, check against this file before generating code, prose, or schema definitions.


More Graphwright Architecture Ideas

I could think of worse ideas than packaging the first book with the identity-server repo and the third book with the bfs-ql repo, and then the second book would go with the kgraph repo, and all of them would have a shared git submodule for all the book creation stuff, subject to customizations for each thing.

Let's consider a block diagram like this. Each of these is a microservice, which makes some assumptions about what other components are available. Each service is scalable as required.

  • Identity server - move it from TestPyPi to PyPi, possibly include the first book, otherwise little or no change.
  • BFS-QL - same thing (except third book).
  • Kgraph (the ingestion pipeline) gets similar treatment with the second book, also goes to PyPi, and it gets a default Domain thing that it can run with. Then domains are things that look like the example Domain thing. Again, PyPi. This is the ingestion pipeline, initially domain agnostic. Maybe use [that markitdown thing from

Typed graphs

Link: https://claude.ai/chat/ced2f44c-eace-4c0f-9a49-4df7c7a42b10

Prompt:

4/12/2026, 10:39:28 AM

It occurs to me that the graph I'm building for Graphwright is specifically a typed graph not just a generic knowledge graph, in the sense that there are a finite number of allowed entity types and a finite number of allowed predicates. This allows for error checking and constructing the graph because we can say we can specify a domain and a range for each predicate in terms of entity types. It may also permit some optimizations in SQL constructs, I haven't thought that through yet.

Response:

Linked Data as MCP — Product Description

What It Is

A SaaS service that turns any linked data resource into an MCP endpoint your chatbot can query.

You give it a URL or connection string. It gives you back an MCP link. You paste that link into your MCP-capable chat client (Claude, Cursor, etc.). Your chatbot can now traverse and reason over that resource intelligently — without you or it

Identity Server Specification

The identity server is the authoritative component for entity identity across the knowledge graph. It handles the full lifecycle of an entity's identity: resolving a mention to an ID, promoting a provisional entity to canonical status, detecting synonyms, and merging duplicates into a single survivor. All of this must be correct under concurrent access from multiple worker processes and multiple server replicas.


Ingestion Architecture

This document describes the ingestion pipeline for the kgraph/medlit system.


Design Principles

Dedup-on-write. The IdentityServer resolves entity identity and detects synonyms incrementally as each entity is written. There is no global deduplication

@wware
wware / Future_Directions.md
Last active February 10, 2026 06:33
Future directions for the knowledge graph stuff
@wware
wware / Plan9.md
Last active February 3, 2026 20:58

Generalizing literature graphs across knowledge domains

Prompt:

1/18/2026, 10:16:42 AM

I want to reframe the medical literature project a bit, allow it to be generalized to other domains of knowledge. We are still building a graph and a graph still consists of nodes (entities) and edges (relationships). We still have a collection of entities from previous ingestion processes. We add a new thing: entities may be "canonical", that is they have been assigned canonical IDs (UMLS numbers or whatever) or they may be "provisional", meaning that we don't know yet if they should be assigned canonical IDs, for instance an entity might be a mention of some trivial thing in just one paper.

Given a batch of papers to ingest, we proceed in two passes. First pass we extract entities and assign canonical IDs where they make sense. Second pass we identify the edges (for medical, these edges are of the three types, extraction, claims, and evidence). The first pass produces a JSON serialization of the collection of enti