A practical guide for reverse‑engineering a Flux‑deployed, multi‑cluster EKS SaaS into a queryable knowledge graph that can generate arc42 documentation, support operational reasoning, and integrate with LLM tooling — without Neo4j, Jena, or long‑running servers.
This project treats architecture and operations documentation as a derived artifact.
- Source of truth: RDF stored as plain files (Turtle / N‑Quads) in Git
- Inputs: Flux + Helm + Kubernetes + GitHub + Observability + Humans
- Outputs:
- arc42 documentation (generated, patchable)
- service “support cards”
- dependency & blast‑radius views
- queryable system model (SPARQL‑ish)
- Tooling philosophy:
- file‑first, reproducible, diffable
- Rust‑native tools you control
- no heavyweight graph databases
The result is a living, queryable model of intent vs reality — suitable for humans, automation, and LLM‑driven assistants.
- arc42 sections are queries over the model
- Markdown is generated output, not the authoritative store
- Humans can add narrative overlays without breaking regeneration
RDF works well because:
- schema is incomplete and evolving
- provenance matters
- contradictions must coexist
- graph queries dominate
We explicitly do not pursue Semantic Web purity, OWL reasoning, or external triple stores.
- RDF lives in Git
- ingestion produces append‑only facts
- derived graphs are regeneratable
- queries run locally via Rust CLI
This aligns with IaC, GitOps, and good ops hygiene.
- Services (runtime, operable units)
- Deployments (Flux → Helm → K8S workloads)
- Clusters / Namespaces / Environments
- Repositories & ownership
- Tenants (application + security)
- Dependencies (service‑to‑service, infra)
- Observability (dashboards, alerts, SLOs)
- Runbooks & human knowledge
- raw logs or metrics (queried live)
- ephemeral pod‑level state
- full trace graphs
Instead, we store pointers and templates for live systems.
ex:System
ex:Service
ex:Component
ex:Tenant
ex:Environment
k8s:Cluster
k8s:Namespace
k8s:Workload (Deployment, StatefulSet, Job, etc.)
flux:Kustomization
flux:HelmRelease
k8s:HelmChart
scm:Repo
scm:Commit
org:Team
org:OnCall
obs:Dashboard
obs:Monitor
obs:SLO
ex:runsAs Service → Workload
ex:dependsOn Service → Service|Infra
ex:ownedBy Service → Team
ex:hasTenant Service → Tenant
ex:inEnv Service → Environment
k8s:inNamespace Workload → Namespace
k8s:inCluster Namespace → Cluster
flux:releases HelmRelease → HelmChart
flux:manages Kustomization → HelmRelease
scm:sourceRepo Service → Repo
scm:deployCommit HelmRelease → Commit
obs:hasDashboard Service → Dashboard
obs:hasMonitor Service → Monitor
obs:hasSLO Service → SLO
This is intentionally boring — and powerful.
system-kg/
ontology/
core.ttl
k8s.ttl
flux.ttl
ops.ttl
facts/
flux/
k8s/
repos/
services/
tenants/
humans/
derived/
current_state.ttl
arc42_model.ttl
docs/
overlays/
services/
concepts/
arc42/
system.md
services/
queries/
arc42/
lint/
ad-hoc/
tools/
kg/ # Rust CLI
facts/= observed or curated truth (append‑only)derived/= regeneratabledocs/overlays/= human narrativearc42/= generated output
Sources:
- Flux Git repos
- HelmRelease + Kustomization manifests
- Live K8S clusters
Extract:
- clusters, namespaces
- workloads
- helm chart versions
- values references
This alone enables the arc42 Deployment View.
Sources:
- repo topics
- CODEOWNERS
- internal conventions (
service.yaml, chart layout)
Goal:
- map workload → service → repo → team
Sources:
- ingress definitions
- env vars / config values
- service mesh telemetry (optional)
- terraform references
Goal:
- service dependency graph
- blast radius reasoning
Treat tenants as first‑class nodes:
- application tenants
- security realms / auth domains
Link tenants to services and (optionally) namespaces.
Sources:
- Datadog dashboards / monitors
- Prometheus rules
- runbooks (git or Confluence)
Store:
- IDs / URLs
- query templates
- runbook links
Every fact should be attributable:
- source system
- file path / object ID
- timestamp
- (optional) confidence score
Use:
- file boundaries
- named graphs
prov:wasDerivedFrom
This enables disagreement tracking and auditability.
arc42 sections map cleanly to graph views:
| arc42 Section | Backing Query |
|---|---|
| Context | Service + external deps |
| Building Blocks | Services + dependsOn |
| Deployment | Service → Workload → Namespace → Cluster |
| Runtime | Monitors + dashboards + scenarios |
| Ops | Runbooks + oncall + alerts |
| Risks | Missing SLOs, fragile deps |
Render via templates fed by SPARQL‑ish queries.
The LLM is a planner, not a source of truth.
Suggested tools:
kg query <sparql>kg render arc42kg lintobs logs <service> <since>obs metrics <service> <expr>
Optional:
- vector index (Qdrant) for prose
- link embeddings back to RDF IRIs
Flow:
- retrieve candidates (vector)
- confirm structure (RDF)
- fetch live data (observability)
Explicitly out of scope:
- real‑time graph mutation
- heavy inference engines
- UI‑first graph editors
- auto‑resolving contradictions
This system records reality — it does not sanitize it.
Scope:
- 1 environment (prod)
- 1–2 clusters
- 5 core services
Deliverables:
- Flux + K8S inventory RDF
- service ownership mapping
- dependency edges
- generated arc42 Deployment + Building Block views
kg lintchecks (owner, runbook, dashboard)
Once this loop works, scaling to 30+ services is incremental.
This approach aligns with:
- GitOps discipline
- infra‑as‑code thinking
- real ops workflows
- LLM‑assisted reasoning (without hallucination)
It revives RDF for what it does best:
holding structured truth, uncertainty, and provenance — at the same time.
Possible follow‑ups:
- flesh out
core.ttlstarter ontology - define canonical IRIs (
k8s:<cluster>/<ns>/<kind>/<name>) - design a minimal SPARQL‑ish DSL for Rust
- write a Flux → RDF extractor
- arc42 renderer templates
This document is intentionally sufficient to start — without committing you to any particular implementation too early.
ontology https://gist.github.com/navicore/6cc18525dfe8b77ddf13d6ffc4b83c69