Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save navicore/0e5f0ba9f6c3adda6900374f5a0fee76 to your computer and use it in GitHub Desktop.

Select an option

Save navicore/0e5f0ba9f6c3adda6900374f5a0fee76 to your computer and use it in GitHub Desktop.
ARC42 as RDF

RDF‑First System Knowledge Graph for Arc42 (File‑First, Rust‑Centric)

A practical guide for reverse‑engineering a Flux‑deployed, multi‑cluster EKS SaaS into a queryable knowledge graph that can generate arc42 documentation, support operational reasoning, and integrate with LLM tooling — without Neo4j, Jena, or long‑running servers.


0. Executive Summary

This project treats architecture and operations documentation as a derived artifact.

  • Source of truth: RDF stored as plain files (Turtle / N‑Quads) in Git
  • Inputs: Flux + Helm + Kubernetes + GitHub + Observability + Humans
  • Outputs:
    • arc42 documentation (generated, patchable)
    • service “support cards”
    • dependency & blast‑radius views
    • queryable system model (SPARQL‑ish)
  • Tooling philosophy:
    • file‑first, reproducible, diffable
    • Rust‑native tools you control
    • no heavyweight graph databases

The result is a living, queryable model of intent vs reality — suitable for humans, automation, and LLM‑driven assistants.


1. Core Principles

1.1 Documentation Is a View, Not the Truth

  • arc42 sections are queries over the model
  • Markdown is generated output, not the authoritative store
  • Humans can add narrative overlays without breaking regeneration

1.2 RDF Is the Right Substrate (Here)

RDF works well because:

  • schema is incomplete and evolving
  • provenance matters
  • contradictions must coexist
  • graph queries dominate

We explicitly do not pursue Semantic Web purity, OWL reasoning, or external triple stores.

1.3 File‑First > Always‑On Databases

  • RDF lives in Git
  • ingestion produces append‑only facts
  • derived graphs are regeneratable
  • queries run locally via Rust CLI

This aligns with IaC, GitOps, and good ops hygiene.


2. System Model Overview

2.1 What We Model

  • Services (runtime, operable units)
  • Deployments (Flux → Helm → K8S workloads)
  • Clusters / Namespaces / Environments
  • Repositories & ownership
  • Tenants (application + security)
  • Dependencies (service‑to‑service, infra)
  • Observability (dashboards, alerts, SLOs)
  • Runbooks & human knowledge

2.2 What We Don’t Model Directly

  • raw logs or metrics (queried live)
  • ephemeral pod‑level state
  • full trace graphs

Instead, we store pointers and templates for live systems.


3. Ontology (Minimal, Opinionated)

3.1 Core Classes

ex:System
ex:Service
ex:Component
ex:Tenant
ex:Environment

k8s:Cluster
k8s:Namespace
k8s:Workload (Deployment, StatefulSet, Job, etc.)

flux:Kustomization
flux:HelmRelease
k8s:HelmChart

scm:Repo
scm:Commit

org:Team
org:OnCall

obs:Dashboard
obs:Monitor
obs:SLO

3.2 Core Relationships

ex:runsAs            Service → Workload
ex:dependsOn         Service → Service|Infra
ex:ownedBy           Service → Team
ex:hasTenant         Service → Tenant
ex:inEnv             Service → Environment

k8s:inNamespace      Workload → Namespace
k8s:inCluster        Namespace → Cluster

flux:releases        HelmRelease → HelmChart
flux:manages         Kustomization → HelmRelease

scm:sourceRepo       Service → Repo
scm:deployCommit     HelmRelease → Commit

obs:hasDashboard     Service → Dashboard
obs:hasMonitor       Service → Monitor
obs:hasSLO           Service → SLO

This is intentionally boring — and powerful.


4. Repository Layout

system-kg/
  ontology/
    core.ttl
    k8s.ttl
    flux.ttl
    ops.ttl
  facts/
    flux/
    k8s/
    repos/
    services/
    tenants/
    humans/
  derived/
    current_state.ttl
    arc42_model.ttl
  docs/
    overlays/
      services/
      concepts/
  arc42/
    system.md
    services/
  queries/
    arc42/
    lint/
    ad-hoc/
  tools/
    kg/   # Rust CLI

Key Rules

  • facts/ = observed or curated truth (append‑only)
  • derived/ = regeneratable
  • docs/overlays/ = human narrative
  • arc42/ = generated output

5. Input Gathering Strategy

Phase 1 — Runtime Inventory (High ROI)

Sources:

  • Flux Git repos
  • HelmRelease + Kustomization manifests
  • Live K8S clusters

Extract:

  • clusters, namespaces
  • workloads
  • helm chart versions
  • values references

This alone enables the arc42 Deployment View.

Phase 2 — Service & Ownership Mapping

Sources:

  • repo topics
  • CODEOWNERS
  • internal conventions (service.yaml, chart layout)

Goal:

  • map workload → service → repo → team

Phase 3 — Dependencies & Interfaces

Sources:

  • ingress definitions
  • env vars / config values
  • service mesh telemetry (optional)
  • terraform references

Goal:

  • service dependency graph
  • blast radius reasoning

Phase 4 — Tenancy Model

Treat tenants as first‑class nodes:

  • application tenants
  • security realms / auth domains

Link tenants to services and (optionally) namespaces.

Phase 5 — Observability & Support

Sources:

  • Datadog dashboards / monitors
  • Prometheus rules
  • runbooks (git or Confluence)

Store:

  • IDs / URLs
  • query templates
  • runbook links

6. Provenance & Confidence

Every fact should be attributable:

  • source system
  • file path / object ID
  • timestamp
  • (optional) confidence score

Use:

  • file boundaries
  • named graphs
  • prov:wasDerivedFrom

This enables disagreement tracking and auditability.


7. Arc42 as Queries

arc42 sections map cleanly to graph views:

arc42 Section Backing Query
Context Service + external deps
Building Blocks Services + dependsOn
Deployment Service → Workload → Namespace → Cluster
Runtime Monitors + dashboards + scenarios
Ops Runbooks + oncall + alerts
Risks Missing SLOs, fragile deps

Render via templates fed by SPARQL‑ish queries.


8. LLM / Claude‑Style Querying

The LLM is a planner, not a source of truth.

Suggested tools:

  • kg query <sparql>
  • kg render arc42
  • kg lint
  • obs logs <service> <since>
  • obs metrics <service> <expr>

Optional:

  • vector index (Qdrant) for prose
  • link embeddings back to RDF IRIs

Flow:

  1. retrieve candidates (vector)
  2. confirm structure (RDF)
  3. fetch live data (observability)

9. Limitations & Non‑Goals

Explicitly out of scope:

  • real‑time graph mutation
  • heavy inference engines
  • UI‑first graph editors
  • auto‑resolving contradictions

This system records reality — it does not sanitize it.


10. First Milestone (Recommended)

Scope:

  • 1 environment (prod)
  • 1–2 clusters
  • 5 core services

Deliverables:

  • Flux + K8S inventory RDF
  • service ownership mapping
  • dependency edges
  • generated arc42 Deployment + Building Block views
  • kg lint checks (owner, runbook, dashboard)

Once this loop works, scaling to 30+ services is incremental.


11. Why This Works

This approach aligns with:

  • GitOps discipline
  • infra‑as‑code thinking
  • real ops workflows
  • LLM‑assisted reasoning (without hallucination)

It revives RDF for what it does best:

holding structured truth, uncertainty, and provenance — at the same time.


12. Next Steps

Possible follow‑ups:

  • flesh out core.ttl starter ontology
  • define canonical IRIs (k8s:<cluster>/<ns>/<kind>/<name>)
  • design a minimal SPARQL‑ish DSL for Rust
  • write a Flux → RDF extractor
  • arc42 renderer templates

This document is intentionally sufficient to start — without committing you to any particular implementation too early.

@navicore
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment