yoq — GPU Infrastructure Without the Kubernetes Tax

The Problem

Kubernetes was designed for Google-scale web services in 2013. Slurm dates back to 2003. Both are built on antiquated technology stacks that predate modern kernel capabilities — io_uring, eBPF, cgroups v2, idmapped mounts — none of which they can take advantage of without layers of bolt-on complexity.

Most teams don't run Google. But they're stuck with Google's decade-old tooling.

15+ components to run GPU workloads on K8s
Days to set up a production GPU cluster
2-3 full-time engineers just to maintain it

The Kubernetes GPU stack: kubelet + kube-proxy + etcd + CNI + GPU Operator + device plugin + KAI Scheduler + RDMA plugin + Multus + cert-manager + ...

Every team running AI workloads faces the same impossible choice: Kubernetes or Slurm. K8s needs 15 components just to schedule a GPU. Slurm is from 2003 — no containers, no secrets, no TLS. Both require a dedicated platform team. That's 15-20% of a small company's headcount just babysitting infrastructure.

What yoq Does Today

This isn't a pitch for something we're going to build. This is built.

Written from scratch in Zig — a modern systems language that compiles to a single static binary with zero runtime dependencies. Zig gives us direct access to 2026 Linux kernel interfaces (io_uring, eBPF, cgroups v2, idmapped mounts) without the layers of abstraction that Go and C++ impose. The result: native kernel integration that Kubernetes architecturally cannot achieve.

Metric	Value
Lines of Zig	55,000
Tests passing	1,035
Binary size	<15MB
Dependencies	0

Capabilities:

Full container runtime — namespaces, cgroups v2, overlayfs, seccomp
OCI image pull/push/build — Dockerfile + TOML format
eBPF networking — DNS, load balancing, network policy (no kube-proxy, no iptables)
io_uring async I/O — zero-copy networking, native kernel event loop
Raft clustering — consensus, SWIM gossip, WireGuard mesh
Encrypted secrets — XChaCha20-Poly1305, rotation
TLS termination — ACME/Let's Encrypt, auto-renewal
Rolling deploys — health checks, automatic rollback
Security audited — all critical/high issues resolved

Deploy in 4 Commands

$ scp yoq node-01:/usr/local/bin/
$ ssh node-01 "yoq serve --init"
  server running on :7700, cluster token: ak7f...x2p

$ ssh node-02 "yoq join node-01:7700"
  joined cluster, node_id=2, overlay=10.40.0.2

$ yoq up manifest.toml
  deploying 3 services...
  ✓ postgres  running  10.42.1.2:5432
  ✓ api       running  10.42.2.3:8080
  ✓ web       running  10.42.1.4:3000  → :443 (TLS)

[service.db]
image = "postgres:16"
env = ["POSTGRES_PASSWORD=${DB_PASS}"]
volumes = ["data:/var/lib/postgresql/data"]

[service.api]
image = "myapp/api:latest"
depends_on = ["db"]
health_check = { http = { path = "/health", port = 8080 } }

[service.web]
image = "myapp/web:latest"
depends_on = ["api"]
tls = { domain = "app.example.com", acme = true }

The Business: GPU Mesh

Same simplicity. Applied to the fastest-growing infrastructure market.

Kubernetes + GPU Stack: GPU Operator, NVIDIA Device Plugin, KAI Scheduler, RDMA Device Plugin, Multus CNI, DCGM Exporter, Network Operator, Custom NCCL configs

yoq:

[service.training]
image = "pytorch-dist:latest"
replicas = 100
gpu = { count = 1, model = "H100" }
gpu.mesh = { enabled = true, backend = "nccl" }

[service.training.checkpoint]
path = "/mnt/storage/checkpoints"
interval = "30m"

GPU detection, InfiniBand RDMA, NCCL topology, gang scheduling, checkpointing, fault recovery. All in the binary. All from the manifest.

The Market

$7-8B container orchestration market, growing 30%+ YoY

Two wedges:

Broad — Kubernetes replacement for 10-500 node teams
Deep — GPU orchestration for AI training & inference

Who needs this:

AI/ML — Teams training on 50-500 GPUs
SaaS — Companies overpaying for managed K8s
On-prem — Regulated industries (finance, defense, healthcare)
Edge — GPU inference at the edge, no cloud dependency

The "overserved by K8s" segment is 20-30% of total market. Capturing 1-2% = $15-30M ARR.

Competition

	yoq	Kubernetes	Slurm	Nomad
Setup time	Minutes	Days	Hours	Hours
Components	1 binary	15+	2 daemons	3 (+ Consul + Vault)
GPU scheduling	Built-in	3 add-ons	Native	Basic plugin
InfiniBand	Built-in	RDMA plugin + Multus	Native	None
Service discovery	eBPF (built-in)	CoreDNS	None	Requires Consul
TLS + Secrets	Built-in	2 add-ons	None	Requires Vault
Fault recovery	Auto checkpoint	Pod eviction	Manual requeue	Restart only

yoq is the only self-contained GPU orchestrator that doesn't need another orchestrator underneath it.

Why Me

Kacy — Founder & CEO

Google Cloud (Current) — Lead, Cloud Alerting & Cloud Notifications. Owns the alerting and notification systems for all of Google Cloud Platform.
Google Distributed Cloud — One of the Engineering Leads. Built and shipped private cloud infrastructure for enterprises. Billion-dollar deals. Saw firsthand what happens when you architect cloud infrastructure poorly — and what it costs to fix.
Fitbit — Owned cloud infrastructure for a few years, running Fitbit's massive Kubernetes cluster at scale. Lived the pain of K8s operations from the operator side — the very pain yoq eliminates.
Why yoq — After years of watching teams drown in Kubernetes complexity — both as a builder and an operator — I decided to build the infrastructure I wish existed. 55,000 lines of working code is the proof.

Business Model

Open Source — Full orchestrator. Runtime, networking, clustering, GPU mesh. Free forever.

Enterprise — Multi-cluster federation, audit logging, SSO/RBAC, SLAs. $500-2K/node/year.

Cloud — Managed yoq clusters. One-click GPU training infra.

Comparable outcomes:

HashiCorp (Nomad) — Acquired by IBM for $6.4B — same OSS-core model
CoreWeave — $35B valuation running GPU infrastructure
Replicated — $1B+ valuation, enterprise K8s tooling

The Ask

$XM Seed — 6 months to v1.0. First paying customers.

Use of funds:

Engineering — 2 systems engineers (Zig/Linux/eBPF)
Infrastructure — 500-node scale validation
Security — GPU isolation audit
Customers — 5-10 design partners

Milestones:

Month 3 — GPU mesh working, design partner agreements
Month 6 — v1.0 shipped, 500-node validated
Month 9 — First enterprise contracts
Year 1 — $500K-1M ARR from 5-10 customers

Closing

The GPU infrastructure market is massive, chaotic, and hungry for simplicity.

We're building the obvious answer for the 90% of teams that don't need Kubernetes.

55,000 lines of working code. Zero dependencies. Ship it with scp.

kacy/yoq-pitch.md

Select an option

No results found