You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CNCF Kubernetes AI Conformance program certifies that Kubernetes platforms can reliably run AI/ML workloads (training, inference, agentic). It is a self-assessment process (automated tests planned for 2026). Certifications are per Kubernetes version, valid one year. Prerequisite: existing K8s conformance.
Conformance categories (MUST level):
Accelerators - DRA support (SHOULD in v1.33, MUST in v1.34+)
Networking - Gateway API for AI inference
Scheduling - Gang scheduling, cluster autoscaling, pod autoscaling (HPA)
Observability - Accelerator metrics, AI service metrics
Security - Secure accelerator access / isolation
Operator - Robust AI controller/CRD (e.g., Ray, Kubeflow)
This matrix shows which open source projects are used across all 28 AI-conformant stacks.
Legend
M = Explicitly mentioned with version
X = Referenced/used (version not specified)
- = Not mentioned
v1.33 Submissions
Project
chinaunicom-csk
cks
daocloud
gardener
giantswarm
jdcloud
jdos
openshift
palette
rke2
talos
Kueue
-
X
X
M
M
-
M
X
-
-
M
Volcano
-
-
-
-
-
X
-
-
-
-
-
Kai Scheduler
-
-
-
-
-
-
-
-
X
-
-
KubeRay
-
-
-
M
M
-
M
-
X
-
M
Kubeflow
-
X
-
-
-
-
-
X
X
-
X
Ray
-
-
-
X
-
X
X
-
-
-
X
NVIDIA GPU Operator
-
-
X
X
M
-
X
X
M
-
-
NVIDIA DCGM Exporter
-
-
-
X
-
X
X
X
-
-
-
NVIDIA Device Plugin
-
-
-
-
-
-
-
-
-
-
M
NVIDIA DRA Driver
-
-
-
-
M
-
-
-
-
-
-
NVIDIA Container Toolkit
-
-
-
-
-
-
-
-
-
-
X
AMD GPU Operator
-
-
-
-
-
-
-
X
-
-
-
Prometheus
-
-
X
X
-
X
X
X
X
-
-
Grafana
-
X
-
-
-
X
-
-
X
-
-
OpenTelemetry
-
-
X
-
-
X
-
-
-
-
-
Istio
-
-
X
-
-
-
-
-
-
-
-
Traefik
-
-
-
X
-
-
X
-
-
-
M
Cilium
-
-
-
-
-
-
-
-
-
-
X
Calico
-
-
-
-
-
-
-
-
M
-
-
Gateway API
-
X
-
X
M
-
X
X
X
-
M
KEDA
-
-
-
-
M
-
-
X
-
-
-
Karpenter
-
-
-
-
X
-
-
-
X
-
-
K8s Cluster Autoscaler
X
-
-
X
X
-
-
-
X
X
-
metrics-server
-
X
-
-
-
-
-
-
-
-
-
prometheus-adapter
-
-
X
X
-
-
X
-
-
-
-
Flux
-
-
-
-
X
-
-
-
-
-
-
JobSet
-
-
-
-
M
-
-
-
-
-
-
SUNK/Slurm
-
X
-
-
-
-
-
-
-
-
-
DeepSpeed
-
-
-
-
-
-
-
X
-
-
-
Sonobuoy
-
-
X
-
X
-
-
-
-
-
-
SUSE AI
-
-
-
-
-
-
-
-
-
X
-
KGateway/Kong
-
-
-
-
-
-
-
-
X
-
-
PyTorch Operator
-
-
-
-
-
X
-
X
-
-
-
v1.34 Submissions
Project
ack
aks
baidu_cce
cks
eks
gardener
gke
kubermatic
lke
OKE
openshift
ovh
rke2
talos
vks
Kueue
-
X
-
X
-
M
X
X
-
X
X
-
-
M
-
Volcano
-
-
-
-
X
-
-
-
-
-
-
X
-
-
-
Kai Scheduler
-
-
-
-
X
-
-
-
-
-
-
-
-
-
-
KubeRay
-
-
-
-
-
X
-
-
-
-
-
M
-
M
-
Kubeflow
-
X
-
X
X
-
X
X
-
X
X
-
-
-
-
Ray
X
X
-
X
X
X
X
-
-
X
-
X
-
X
-
NVIDIA GPU Operator
-
-
-
-
-
X
-
X
-
-
X
X
-
-
-
NVIDIA DCGM Exporter
-
-
-
-
X
X
-
-
-
-
X
X
-
-
-
NVIDIA Device Plugin
-
-
-
-
-
-
-
-
-
X
-
-
-
M
-
NVIDIA DRA Driver
-
-
-
-
-
-
-
-
-
-
-
X
-
-
-
NVIDIA Container Toolkit
-
-
-
-
-
-
-
-
-
-
-
-
-
X
-
AMD GPU Operator
-
-
-
-
-
-
-
-
-
-
X
-
-
-
-
Prometheus
X
X
-
-
X
X
-
X
X
X
X
X
-
-
X
Grafana
-
-
-
X
X
-
-
-
X
-
-
-
-
-
-
OpenTelemetry
-
-
-
-
X
-
-
-
-
-
-
-
-
-
-
Istio
-
X
-
-
-
-
-
-
-
X
-
-
-
-
X
Traefik
-
-
-
-
-
X
-
-
-
-
-
-
-
X
-
Cilium
-
-
-
-
-
-
-
-
-
-
-
-
-
X
-
Gateway API
-
X
-
X
X
X
X
-
-
X
X
-
-
M
-
KEDA
-
X
-
-
X
-
-
-
-
-
X
-
-
-
-
Karpenter
-
-
-
-
X
-
-
-
-
-
-
-
-
-
-
K8s Cluster Autoscaler
-
-
-
-
-
X
-
X
-
X
-
X
X
-
-
metrics-server
-
-
-
X
-
-
-
-
-
-
-
-
-
-
-
prometheus-adapter
-
-
-
-
-
X
-
-
-
-
-
X
-
-
-
vLLM
-
-
-
-
X
-
-
-
-
-
-
-
-
-
-
AIBrix
-
-
-
-
X
-
-
-
-
-
-
-
-
-
-
NVIDIA Triton
-
-
-
-
X
-
-
-
-
-
-
-
-
-
-
KAITO
-
X
-
-
-
-
-
-
-
-
-
-
-
-
-
KubeLB
-
-
-
-
-
-
-
X
-
-
-
-
-
-
-
Yunikorn
-
-
-
-
X
-
-
-
-
-
-
-
-
-
-
LeaderWorkerSet
-
-
-
-
X
-
-
-
-
-
-
-
-
-
-
SUNK/Slurm
-
-
-
X
-
-
-
-
-
-
-
-
-
-
-
DeepSpeed
-
-
-
-
-
-
-
-
-
-
X
-
-
-
-
SUSE AI
-
-
-
-
-
-
-
-
-
-
-
-
X
-
-
PyTorch Operator
-
-
-
-
-
-
-
-
-
-
X
-
-
-
-
v1.35 Submissions
Project
cks
gke
Kueue
X
X
Kubeflow
X
X
Ray
X
X
K-Gateway
X
X
metrics-server
X
-
SUNK/Slurm
X
-
Grafana
X
-
Gateway API
-
X
Open Source Project Adoption Summary
Total unique stacks using each project across all 28 submissions:
#
Project
Stacks (of 28)
%
CNCF Status
Category
1
Kueue
18 / 28
64%
K8s SIG
Scheduling
2
Gateway API
17 / 28
61%
K8s SIG
Networking
3
Prometheus
15 / 28
54%
Graduated
Observability
4
Ray
14 / 28
50%
-
AI Framework
5
Kubeflow
13 / 28
46%
Incubating
AI Platform
6
NVIDIA GPU Operator
9 / 28
32%
-
Accelerator
7
KubeRay
8 / 28
29%
-
AI Operator
8
NVIDIA DCGM Exporter
8 / 28
29%
-
Observability
9
K8s Cluster Autoscaler
8 / 28
29%
K8s Core
Autoscaling
10
Grafana
6 / 28
21%
-
Observability
11
KEDA
5 / 28
18%
Graduated
Autoscaling
12
Traefik
5 / 28
18%
-
Networking
13
Istio
4 / 28
14%
Graduated
Networking
14
K-Gateway (GW API Inf. Ext)
4 / 28
14%
K8s SIG
Networking
15
prometheus-adapter
4 / 28
14%
K8s SIG
Observability
16
SUNK/Slurm
3 / 28
11%
- (Proprietary)
Scheduling
17
Karpenter
3 / 28
11%
K8s SIG
Autoscaling
18
Volcano
3 / 28
11%
Incubating
Scheduling
19
OpenTelemetry
3 / 28
11%
Incubating
Observability
20
PyTorch Operator
3 / 28
11%
-
AI Operator
21
metrics-server
3 / 28
11%
K8s SIG
Observability
22
DeepSpeed
2 / 28
7%
-
AI Framework
23
Cilium
2 / 28
7%
Graduated
Networking
24
NVIDIA Device Plugin
2 / 28
7%
-
Accelerator
25
NVIDIA DRA Driver
2 / 28
7%
-
Accelerator
26
NVIDIA Container Toolkit
2 / 28
7%
-
Accelerator
27
AMD GPU Operator
2 / 28
7%
-
Accelerator
28
Kai Scheduler
2 / 28
7%
Sandbox
Scheduling
29
SUSE AI
2 / 28
7%
-
AI Platform
30
Sonobuoy
2 / 28
7%
-
Testing
31
Flux
1 / 28
4%
Graduated
GitOps
32
Calico
1 / 28
4%
-
Networking
33
JobSet
1 / 28
4%
K8s SIG
Scheduling
34
KubeLB
1 / 28
4%
-
Networking
35
Yunikorn
1 / 28
4%
ASF
Scheduling
36
LeaderWorkerSet
1 / 28
4%
K8s SIG
Scheduling
37
vLLM
1 / 28
4%
-
AI Inference
38
AIBrix
1 / 28
4%
-
AI Inference
39
NVIDIA Triton
1 / 28
4%
-
AI Inference
40
KAITO
1 / 28
4%
-
AI Operator
41
KGateway/Kong
1 / 28
4%
-
Networking
Open Source Project Reference
Project
GitHub Repo
License
CNCF Status
Description
Kueue
kubernetes-sigs/kueue
Apache-2.0
K8s SIG
Kubernetes-native job queueing
Volcano
volcano-sh/volcano
Apache-2.0
Incubating
Batch scheduling for K8s
Kai Scheduler
kai-scheduler/KAI-Scheduler
Apache-2.0
Sandbox
GPU-optimized AI scheduler
KubeRay
ray-project/kuberay
Apache-2.0
-
Ray on Kubernetes operator
Kubeflow
kubeflow/kubeflow
Apache-2.0
Incubating
ML platform for K8s
Kubeflow Trainer
kubeflow/training-operator
Apache-2.0
Incubating
Distributed training operator
Ray
ray-project/ray
Apache-2.0
-
Distributed AI framework
vLLM
vllm-project/vllm
Apache-2.0
-
LLM inference engine
AIBrix
vllm-project/aibrix
Apache-2.0
-
GenAI inference components
DeepSpeed
microsoft/DeepSpeed
Apache-2.0
-
Distributed training library
NVIDIA GPU Operator
NVIDIA/gpu-operator
Apache-2.0
-
GPU lifecycle management
NVIDIA DCGM Exporter
NVIDIA/dcgm-exporter
Apache-2.0
-
GPU metrics for Prometheus
NVIDIA Device Plugin
NVIDIA/k8s-device-plugin
Apache-2.0
-
GPU device plugin for K8s
NVIDIA DRA Driver
NVIDIA/k8s-dra-driver
Apache-2.0
-
DRA driver for GPUs
NVIDIA Container Toolkit
NVIDIA/nvidia-container-toolkit
Apache-2.0
-
GPU container runtime
AMD GPU Operator
ROCm/gpu-operator
Apache-2.0
-
AMD GPU management
Prometheus
prometheus/prometheus
Apache-2.0
Graduated
Monitoring system
Grafana
grafana/grafana
AGPL-3.0
-
Observability platform
OpenTelemetry
open-telemetry/opentelemetry-collector
Apache-2.0
Incubating
Telemetry collection
Istio
istio/istio
Apache-2.0
Graduated
Service mesh
Traefik
traefik/traefik
MIT
-
Cloud-native proxy
Cilium
cilium/cilium
Apache-2.0
Graduated
eBPF networking
Calico
projectcalico/calico
Apache-2.0
-
K8s networking
KEDA
kedacore/keda
Apache-2.0
Graduated
Event-driven autoscaling
Karpenter
kubernetes-sigs/karpenter
Apache-2.0
K8s SIG
Node autoscaling
K8s Cluster Autoscaler
kubernetes/autoscaler
Apache-2.0
K8s Core
Cluster autoscaling
metrics-server
kubernetes-sigs/metrics-server
Apache-2.0
K8s SIG
Resource metrics
prometheus-adapter
kubernetes-sigs/prometheus-adapter
Apache-2.0
K8s SIG
Custom metrics API
Gateway API
kubernetes-sigs/gateway-api
Apache-2.0
K8s SIG
K8s networking API
K-Gateway
kubernetes-sigs/gateway-api-inference-extension
Apache-2.0
K8s SIG
AI inference gateway
Flux
fluxcd/flux2
Apache-2.0
Graduated
GitOps toolkit
JobSet
kubernetes-sigs/jobset
Apache-2.0
K8s SIG
Multi-job orchestration
LeaderWorkerSet
kubernetes-sigs/lws
Apache-2.0
K8s SIG
LLM inference sharding
KubeLB
kubermatic/kubelb
Apache-2.0
-
Centralized load balancing
Yunikorn
apache/yunikorn-core
Apache-2.0
ASF
Resource scheduler
KAITO
microsoft/kaito
MIT
-
AI toolchain operator
NVIDIA Triton
triton-inference-server/server
BSD-3-Clause
-
Inference server
Gardener
gardener/gardener
Apache-2.0
-
K8s cluster management
Talos Linux
siderolabs/talos
MPL-2.0
-
Minimal K8s OS
RKE2
rancher/rke2
Apache-2.0
-
Secure K8s distribution
Omni
siderolabs/omni
BSL-1.1
-
Talos cluster management
podinfo
stefanprodan/podinfo
Apache-2.0
-
K8s test microservice
CubeFS (fmr ContainerFS)
cubefs/cubefs
Apache-2.0
Graduated
Distributed storage
NVIDIA Ecosystem Summary
NVIDIA dominates the accelerator layer across the AI conformance program. Every single submission relies on NVIDIA technology in some form -- either directly via open-source NVIDIA projects or indirectly through managed cloud GPU services built on NVIDIA hardware.
NVIDIA Project Adoption Across All 28 Stacks
NVIDIA Project
GitHub Repo
Stacks (of 28)
% of 28
Versions Observed
NVIDIA GPU Operator
NVIDIA/gpu-operator
9
32%
v1.0.1 (Giant Swarm), v25.3.4 (Palette)
NVIDIA DCGM Exporter
NVIDIA/dcgm-exporter
8
29%
(versions not specified)
NVIDIA Device Plugin
NVIDIA/k8s-device-plugin
2
7%
v0.14.5 (Talos)
NVIDIA DRA Driver
NVIDIA/k8s-dra-driver
2
7%
v25.3.0 (Giant Swarm)
NVIDIA Container Toolkit
NVIDIA/nvidia-container-toolkit
2
7%
(version not specified)
NVIDIA Triton Inference Server
triton-inference-server/server
1
4%
(version not specified)
Kai Scheduler (originally NVIDIA)
kai-scheduler/KAI-Scheduler
2
7%
(version not specified)
NVIDIA Project Usage by Submission (detailed)
Submission
Version
GPU Operator
DCGM Exporter
Device Plugin
DRA Driver
Container Toolkit
Triton
Kai Scheduler
NVIDIA Projects Used
chinaunicom-csk
v1.33
-
-
-
-
-
-
-
0
cks
v1.33
-
-
-
-
-
-
-
0*
daocloud
v1.33
X
-
-
-
-
-
-
1
gardener
v1.33
X
X
-
-
-
-
-
2
giantswarm
v1.33
X
-
-
X
-
-
-
2
jdcloud
v1.33
-
X
-
-
-
-
-
1
jdos
v1.33
X
X
-
-
-
-
-
2
openshift
v1.33
X
X
-
-
-
-
-
2
palette
v1.33
X
-
-
-
-
-
X
2
rke2
v1.33
-
-
-
-
-
-
-
0**
talos
v1.33
-
-
X
-
X
-
-
2
ack
v1.34
-
-
-
-
-
-
-
0*
aks
v1.34
-
-
-
-
-
-
-
0*
baidu_cce
v1.34
-
-
-
-
-
-
-
0*
cks
v1.34
-
-
-
-
-
-
-
0*
eks
v1.34
-
X
-
-
-
X
X
3
gardener
v1.34
X
X
-
-
-
-
-
2
gke
v1.34
-
-
-
-
-
-
-
0*
kubermatic
v1.34
X
-
-
-
-
-
-
1
lke
v1.34
-
-
-
-
-
-
-
0*
OKE
v1.34
-
-
X
-
-
-
-
1
openshift
v1.34
X
X
-
-
-
-
-
2
ovh
v1.34
X
X
-
X
-
-
-
3
rke2
v1.34
-
-
-
-
-
-
-
0**
talos
v1.34
-
-
X
-
X
-
-
2
vks
v1.34
-
-
-
-
-
-
-
0*
cks
v1.35
-
-
-
-
-
-
-
0*
gke
v1.35
-
-
-
-
-
-
-
0*
* Managed cloud services use NVIDIA GPUs but don't disclose specific NVIDIA software components in their submissions.
** SUSE AI stack likely includes NVIDIA components but bundles them under its own umbrella.
NVIDIA has 100% market penetration -- every single AI-conformant Kubernetes stack uses NVIDIA GPUs, making it the only universal hardware dependency in the program.
The GPU Operator + DCGM Exporter duo is the de facto standard for self-managed platforms (32% and 29% explicit adoption), while managed clouds abstract these away.
DRA Driver adoption is nascent -- only 2 stacks (Giant Swarm, OVH) explicitly use the new k8s-dra-driver, vs 2 still on the legacy Device Plugin. Most managed clouds haven't disclosed their DRA implementation details.
NVIDIA Triton appears in only 1 submission (EKS) despite being the leading inference server, suggesting most vendors treat inference serving as user-deployed rather than platform-provided.
Kai Scheduler, originally an NVIDIA project (now CNCF Sandbox), is used by 2 stacks (Palette, EKS), positioning NVIDIA in the scheduling layer as well.
Only 3 of 28 stacks support any non-NVIDIA accelerator -- OpenShift (AMD), GKE (TPU), and EKS (Trainium/Inferentia). This makes NVIDIA the single point of hardware dependency for 93% of conformant platforms.
Key Findings
1. Market Participation
28 unique submissions from 21 distinct vendors across 3 Kubernetes versions
5 vendors have submitted for multiple versions (CoreWeave leads with all 3)
v1.34 has the most submissions (15), suggesting the program gained traction after launch
v1.35 is early with only 2 submissions (CoreWeave and Google)
2. Dominant Open Source Stack
The "default" AI-conformant Kubernetes stack converges on:
Kueue for gang scheduling (18/28 stacks, 64%)
Prometheus for observability (15/28, 54%)
Gateway API for networking (17/28, 61%)
Ray/KubeRay for AI operators (14/28, 50%)
NVIDIA GPU Operator + DCGM for accelerator management (9+8/28)
Kubernetes Cluster Autoscaler or Karpenter for scaling
3. Scheduling Landscape
Kueue dominates gang scheduling with 64% adoption
Volcano is a distant second (3 stacks, primarily Chinese cloud providers + OVH)
AWS EKS stands out by supporting the most schedulers (Volcano, Kai, Yunikorn, LeaderWorkerSet, AWS Batch)
4. Accelerator Diversity
NVIDIA GPUs are universal -- every submission uses NVIDIA in some form
AMD GPUs (ROCm): Only Red Hat OpenShift
Google TPU: Only GKE
AWS Trainium/Inferentia: Only EKS
GPU sharing/MIG: Only GKE (v1.35) explicitly covers SHOULD-level requirements
Use this file to detect deltas in future analyses.
Analysis Date: 2026-03-16
Repository: github.com/cncf/k8s-ai-conformance
Commit SHA: 223d15f97434ea478f1440d73901435d16503682
Branch: main
Submission Inventory at Time of Analysis
v1.33 (11 submissions)
chinaunicom-csk
cks
daocloud
gardener
giantswarm
jdcloud
jdos
openshift
palette
rke2
talos
v1.34 (15 submissions)
OKE
ack
aks
baidu_cce
cks
eks
gardener
gke
kubermatic
lke
openshift
ovh
rke2
talos
vks
v1.35 (2 submissions)
cks
gke
Version Directories Present
v1.33
v1.34
v1.35
Total Submissions: 28
Total Unique Vendors: 21
How to Detect Deltas
# From repo root, compare current submissions against this baseline:
git log --oneline 223d15f..HEAD -- 'v1.*/*/PRODUCT.yaml'# List all current submissions:
find v1.* -name PRODUCT.yaml -type f | sort
# New version directories:
ls -d v1.*| sort