Skip to content

Instantly share code, notes, and snippets.

@wware
wware / Future_Directions.md
Last active February 10, 2026 06:33
Future directions for the knowledge graph stuff
@wware
wware / Plan9.md
Last active February 3, 2026 20:58

Generalizing literature graphs across knowledge domains

Prompt:

1/18/2026, 10:16:42 AM

I want to reframe the medical literature project a bit, allow it to be generalized to other domains of knowledge. We are still building a graph and a graph still consists of nodes (entities) and edges (relationships). We still have a collection of entities from previous ingestion processes. We add a new thing: entities may be "canonical", that is they have been assigned canonical IDs (UMLS numbers or whatever) or they may be "provisional", meaning that we don't know yet if they should be assigned canonical IDs, for instance an entity might be a mention of some trivial thing in just one paper.

Given a batch of papers to ingest, we proceed in two passes. First pass we extract entities and assign canonical IDs where they make sense. Second pass we identify the edges (for medical, these edges are of the three types, extraction, claims, and evidence). The first pass produces a JSON serialization of the collection of enti

Battle Plan: Medical Knowledge Graph Ingestion Optimization

This document outlines the strategic roadmap for integrating Large Language Models (LLMs) into the ingestion pipeline, specifically targeting Stage 4 (Claims Extraction) and Stage 5 (Evidence Aggregation).

1. LLM Infrastructure Strategy

The choice of model is critical for handling the specialized technical language and structured data extraction requirements of medical research papers.

Component Model Recommendation Rationale

Short answer: yes, this test suite makes a lot of sense, and it’s actually quite strong architecturally. Longer answer below, with praise and some concrete suggestions where you might tighten it.


What you’re doing well

1. Clear layering and intent

Your tests map very cleanly onto your system architecture:

@wware
wware / SBIR.md
Last active January 11, 2026 14:34

Great work on the pipeline refactoring! This is a solid Unix-style architecture with clean separation of concerns:

What you've built:

  • Modular stages - Each pipeline script is independent and can be run separately
  • Interface-based design - Storage, parsers, and embeddings all use ABC interfaces
  • Swappable backends - SQLite for dev/testing, PostgreSQL+pgvector for production
  • Clean data flow - Each stage reads/writes through well-defined interfaces
  • Comprehensive docs - README and TESTING guide are clear and helpful

GPU-Accelerated AI Development with Lambda Labs

Like many developers, you may get tired of paying subscription fees to use online LLMs, especially when you hit usage limits or get throttled. Running models locally with Ollama is great, but without a GPU, performance suffers. Cloud GPU instances from services like Lambda Labs, RunPod, and others offer a cost-effective middle ground.

This guide focuses on Lambda Labs because of their straightforward pricing, good GPU selection, and excellent performance. The setup takes about 10-15 minutes and provides enormous speedups over CPU-only inference—typically 10-100x faster depending on the workload.

Table of Contents

  1. Quick Overview
  2. Instance Selection Guide
@wware
wware / 0_README.md
Last active December 30, 2025 18:40

Graph-RAG with Neo4j and MCP

A hands-on introduction to graph databases using Neo4j's classic movie dataset, accessible through both the Neo4j web interface and AI-powered natural language queries via Cursor IDE.

What Are Graph Databases?

Imagine you're organizing information about movies and actors. A traditional database stores these as separate tables:

Movies Table: Actors Table: Acted_In Table: