Skip to content

Instantly share code, notes, and snippets.

@niquola
Last active March 23, 2026 10:34
Show Gist options
  • Select an option

  • Save niquola/6bcd4cbb2e7b294d839880d839bbcc31 to your computer and use it in GitHub Desktop.

Select an option

Save niquola/6bcd4cbb2e7b294d839880d839bbcc31 to your computer and use it in GitHub Desktop.
OMOP CDM Knowledge Graph — Claude Code skill (omop-kg)
name omop-kg
description OMOP CDM Knowledge Graph — ontology of 643 canonical concepts, 1067 relationships, 1099 doc chunks with embeddings. Extracted from CommonDataModel docs + The Book of OHDSI. Use when user asks about OMOP CDM tables, fields, concepts, ETL, cohorts, vocabularies, or needs to search OMOP documentation.
allowed-tools Bash(curl *https://omop-kg.apki.dev*)

OMOP CDM Knowledge Graph

643 canonical concepts, 1057 raw extractions, 1067 canonical relationships, 1099 chunks, 66 pages, 1666 embeddings.

Sources: OHDSI CommonDataModel v5.4 docs (38 pages), The Book of OHDSI (23 chapters), LLM relationship refinements.

UI Browser

Interactive knowledge graph browser at http://localhost:3333 (bun --hot ui.ts from the project root).

  • / — A-Z dictionary with stats (concepts, relationships, chunks, embeddings)
  • /concept/<name> — concept detail: description, pages, raw extractions, relationships, similar concepts (BM25+vector RRF), relevant chunks (BM25+vector RRF with highlights)
  • /search?q=<query> — hybrid search: concepts (BM25+vector) + chunks (BM25+vector), grouped by page with heading hierarchy
  • /page/<id> — page view: summary, audience, topics, defines, key takeaways, extracted concepts + relationships
  • /chunks/<page_id> — hierarchical chunk browser with markdown rendering, anchor links (#chunk-<id>)
  • /graph — d3.js force-directed graph (click=highlight, dblclick=ego-graph, empty=top connected nodes)
  • /api/sql — readonly SQL API (POST JSON {"sql":"..."}, returns JSON array)

Hybrid Search (recommended)

BM25 + vector RRF search across concepts and chunks in one call:

curl -s 'https://omop-kg.apki.dev/api/search?q=drug+era&limit=10'

Returns {query, concepts: [...], chunks: [...]} — each with bm25, cos, rrf scores and match (both/bm25/semantic).

SQL Query Tool

Readonly SQL endpoint. POST JSON with sql field, get JSON array back.

curl -s https://omop-kg.apki.dev/api/sql -d '{"sql":"SELECT name, type FROM cdm_canonical_concepts LIMIT 5"}'

Tables

cdm_canonical_concepts (643 rows) — deduplicated concepts

Column Type Description
id TEXT PK lower(name)
name TEXT Canonical name
kind TEXT class or instance
type TEXT See types below
category TEXT See categories below
description TEXT Best description
alt_names TEXT[] Alternative names merged into this
defined_pages TEXT[] Pages where concept is defined

cdm_concepts (1057 rows) — raw LLM extractions

Column Type Description
id SERIAL PK
name TEXT Concept name as extracted
kind TEXT class or instance
type TEXT Concept type
category TEXT Domain category
description TEXT Description with doc quotes
relevance TEXT defined / described / mentioned
section TEXT Section heading
page_id TEXT FK Source page id

cdm_canonical_relationships (1067 rows) — deduplicated relationships

Column Type Description
id SERIAL PK
source TEXT Source concept name
relation TEXT Relation type
target TEXT Target concept name
contexts TEXT[] All context descriptions
pages TEXT[] All source pages
count INT How many raw sources

cdm_relationships (1126 rows) — raw extracted relationships

Column Type Description
source TEXT Source concept
relation TEXT Relation type
target TEXT Target concept
context TEXT Brief explanation
page_id TEXT FK Source page id

cdm_pages (66 rows) — documentation pages with full text

Column Type Description
id TEXT PK Page slug (e.g. faq, book-CommonDataModel)
title TEXT Page title
summary TEXT LLM-generated summary
content TEXT Full markdown content
size_bytes INT Content length
meta JSONB {audience, topics, defines, key_takeaways}

cdm_chunks (1099 rows) — markdown-parsed documentation chunks

Column Type Description
id SERIAL PK
page_id TEXT FK Source page
h1–h5 TEXT Heading hierarchy
depth INT Heading depth
seq INT Order within page
type TEXT paragraph / code / list / table / frontmatter / image
lang TEXT Language for code blocks
content TEXT Full chunk text
size INT Content length

BM25 index on content: WHERE id @@@ 'content:drug AND content:era'

cdm_embeddings (1666 rows) — OpenAI text-embedding-3-large 1536d

Column Type Description
id TEXT PK concept-name or chunk-<id>
name TEXT Display name
source TEXT concept or chunk
source_id INT Chunk id if source=chunk
embedding vector(1536) Embedding vector

cdm_tables (39 rows) / cdm_fields (432 rows) — CDM v5.4 schema

SELECT field_name, datatype, is_required, is_fk, fk_table, fk_domain
FROM cdm_fields WHERE table_name = 'drug_era' ORDER BY id;

OMOP CDM v5.4 Tables (39)

Clinical Events (CDM schema): person*, observation_period*, visit_occurrence, visit_detail, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, note, note_nlp, specimen, death, episode, episode_event, fact_relationship

Health System: location, care_site, provider

Health Economics: payer_plan_period, cost

Derived: drug_era, condition_era, dose_era

Vocabulary (VOCAB schema): concept, vocabulary, domain, concept_class, concept_relationship, relationship, concept_synonym, concept_ancestor, source_to_concept_map, drug_strength

Results: cohort, cohort_definition

Metadata: cdm_source, metadata

* = required tables (person, observation_period)

cdm_merge_epochs / cdm_merge_decisions — dedup audit trail

Types (11)

Type Count
table 105
pattern 104
process 103
tool 68
field 59
vocabulary 54
convention 48
concept-domain 34
concept-class 30
standard 23
relationship-type 9

Categories (10)

Category Count
analytics 193
vocabulary 108
clinical 84
etl 55
derived 38
metadata 36
quality 33
health-system 31
administration 28
financial 20

Relation Types (24)

Relation Count
uses 305
has-part 174
related-to 131
constrains 84
part-of 71
constrained-by 50
references 47
is-a 37
derives-from 31
stored-by 26
used-by 26
has-subtype 19
alternative-to 18
maps-to 17
source-of 12

Reading Documentation Pages

List all pages

SELECT id, left(summary, 100) as summary, meta->>'audience' as audience
FROM cdm_pages WHERE content IS NOT NULL ORDER BY id;

Page table of contents (heading hierarchy)

SELECT h1, h2, h3, min(seq) as seq
FROM cdm_chunks WHERE page_id = 'book-Cohorts' AND type != 'frontmatter'
GROUP BY h1, h2, h3 ORDER BY min(seq);

Read a specific section by seq number

SELECT content FROM cdm_chunks WHERE page_id = 'book-Cohorts' AND seq = 26;

Read all chunks of a section by heading

SELECT seq, type, content FROM cdm_chunks
WHERE page_id = 'book-Cohorts' AND h2 = 'Implementing a Cohort Using ATLAS'
ORDER BY seq;

Example Queries

-- Find concept
SELECT name, type, category, left(description, 300)
FROM cdm_canonical_concepts WHERE name = 'Drug Era';

-- Fuzzy search
SELECT name, type, category
FROM cdm_canonical_concepts WHERE name ILIKE '%cohort%' ORDER BY name;

-- Relationships for a concept
SELECT source, relation, target, pages
FROM cdm_canonical_relationships
WHERE source = 'Drug Era' OR target = 'Drug Era'
ORDER BY relation;

-- CDM table fields
SELECT field_name, datatype, is_required, is_fk, fk_table, fk_domain
FROM cdm_fields WHERE table_name = 'condition_occurrence' ORDER BY id;

-- Table descriptions
SELECT name, left(description, 200) FROM cdm_tables ORDER BY name;

-- BM25 search in chunks
SELECT id, page_id, h1, h2, type
FROM cdm_chunks WHERE id @@@ 'content:drug AND content:era'
LIMIT 10;

-- Vector similar concepts
SELECT e2.name, round((1 - (e1.embedding <=> e2.embedding))::numeric, 3) as cos
FROM cdm_embeddings e1
JOIN cdm_embeddings e2 ON e1.id != e2.id AND e2.source = 'concept'
WHERE e1.id = 'drug era' AND e1.source = 'concept'
ORDER BY e1.embedding <=> e2.embedding LIMIT 10;

-- Vector similar chunks to a concept
SELECT c.page_id, c.h1, c.h2, c.type, round((1 - (e.embedding <=> ce.embedding))::numeric, 3) as cos
FROM cdm_embeddings e
JOIN cdm_embeddings ce ON ce.source = 'chunk'
JOIN cdm_chunks c ON c.id = ce.source_id
WHERE e.id = 'drug era' AND e.source = 'concept'
ORDER BY e.embedding <=> ce.embedding LIMIT 10;

-- Page summaries
SELECT id, left(summary, 200), meta->>'audience' as audience
FROM cdm_pages WHERE meta IS NOT NULL ORDER BY id;

-- Most connected concepts
SELECT name, type, category,
  (SELECT count(*) FROM cdm_canonical_relationships r WHERE r.source = c.name OR r.target = c.name) as rels
FROM cdm_canonical_concepts c ORDER BY rels DESC LIMIT 20;

-- Concepts defined on a page
SELECT name, kind, type, relevance
FROM cdm_concepts WHERE page_id = 'book-Cohorts' ORDER BY relevance, name;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment