CNCF Kubernetes AI Conformance - Full Analysis

Date: 2026-03-16 Repository: github.com/cncf/k8s-ai-conformance Commit SHA: 223d15f97434ea478f1440d73901435d16503682 Branch: main

Overview
Submission Matrix by Version
Software Stack Matrix
Open Source Project Usage Matrix
Open Source Project Reference
NVIDIA Ecosystem Summary
Key Findings

Overview

The CNCF Kubernetes AI Conformance program certifies that Kubernetes platforms can reliably run AI/ML workloads (training, inference, agentic). It is a self-assessment process (automated tests planned for 2026). Certifications are per Kubernetes version, valid one year. Prerequisite: existing K8s conformance.

Conformance categories (MUST level):

Accelerators - DRA support (SHOULD in v1.33, MUST in v1.34+)
Networking - Gateway API for AI inference
Scheduling - Gang scheduling, cluster autoscaling, pod autoscaling (HPA)
Observability - Accelerator metrics, AI service metrics
Security - Secure accelerator access / isolation
Operator - Robust AI controller/CRD (e.g., Ray, Kubeflow)

v1.35 additions (SHOULD level): driver_runtime_management, gpu_sharing, virtualized_accelerator

Current submissions: 11 (v1.33) + 15 (v1.34) + 2 (v1.35) = 28 total

Submission Matrix by Version

v1.33 (11 submissions)

#	Directory	Vendor	Product	Platform Version	K8s Version	Cloud/On-Prem
1	chinaunicom-csk	Chinaunicom Cloud	CSK	v1.33	v1.33	Cloud
2	cks	CoreWeave	CoreWeave Kubernetes Service	v1.33	v1.33	Cloud (GPU)
3	daocloud	DaoCloud	DaoCloud Enterprise	v5.0	v1.33	On-Prem
4	gardener	NeoNephos Foundation	Gardener	v1.130.0	v1.33	Multi-Cloud
5	giantswarm	Giant Swarm	Giant Swarm Platform	1.33.0	v1.33	AWS
6	jdcloud	JD Cloud	JCS for Kubernetes	v1.33.3	v1.33	Cloud
7	jdos	JD.com	JDOS	v3.0	v1.33	On-Prem
8	openshift	Red Hat	OpenShift Container Platform	4.20	v1.33	Hybrid
9	palette	Spectro Cloud	Spectro Cloud Palette	4.8.x	v1.33	AWS
10	rke2	SUSE	RKE2	v1.33	v1.33	Any
11	talos	Sidero Labs	Talos Linux	1.11.3	v1.33	Bare Metal

v1.34 (15 submissions)

#	Directory	Vendor	Product	Platform Version	K8s Version	Cloud/On-Prem
1	ack	Alibaba Cloud	ACK	1.34.1-aliyun.1	v1.34	Cloud
2	aks	Microsoft	Azure Kubernetes Service	v1.34	v1.34	Cloud
3	baidu_cce	Baidu Cloud	CCE	1.34	v1.34	Cloud
4	cks	CoreWeave	CoreWeave Kubernetes Service	v1.34	v1.34	Cloud (GPU)
5	eks	AWS	Amazon EKS	1.34.1-eks.4	v1.34	Cloud
6	gardener	NeoNephos Foundation	Gardener	v1.134.2	v1.34	Multi-Cloud
7	gke	Google	Google Kubernetes Engine	1.34.0-gke.1662000	v1.34	Cloud
8	kubermatic	Kubermatic	Kubermatic Kubernetes Platform	v2.29	v1.34	Multi-Cloud
9	lke	Akamai	Linode Kubernetes Engine	v1.34	v1.34	Cloud
10	OKE	Oracle	OCI Kubernetes Engine	v1.34	v1.34	Cloud
11	openshift	Red Hat	OpenShift Container Platform	4.21	v1.34	Hybrid
12	ovh	OVHcloud	OVHcloud Managed Kubernetes	1.0	v1.34	Cloud
13	rke2	SUSE	RKE2	v1.34	v1.34	Any
14	talos	Sidero Labs	Talos Linux	1.11.3	v1.34	Bare Metal
15	vks	Broadcom	vSphere Kubernetes Service	v3.5.0	v1.34	On-Prem (VMware)

v1.35 (2 submissions)

#	Directory	Vendor	Product	Platform Version	K8s Version	Cloud/On-Prem
1	cks	CoreWeave	CoreWeave Kubernetes Service	v1.35	v1.35	Cloud (GPU)
2	gke	Google	Google Kubernetes Engine	1.35.0-gke.2232000	v1.35	Cloud

Cross-Version Presence

Vendor	Product	v1.33	v1.34	v1.35
CoreWeave	CKS	X	X	X
Sidero Labs	Talos Linux	X	X
NeoNephos Foundation	Gardener	X	X
Red Hat	OpenShift	X	X
SUSE	RKE2	X	X
Google	GKE		X	X
Alibaba Cloud	ACK		X
Microsoft	AKS		X
AWS	EKS		X
Oracle	OKE		X
Akamai	LKE		X
OVHcloud	MKS		X
Kubermatic	KKP		X
Broadcom	VKS		X
Baidu Cloud	CCE		X
Chinaunicom Cloud	CSK	X
DaoCloud	DaoCloud Enterprise	X
Giant Swarm	Giant Swarm Platform	X
JD Cloud	JCS for Kubernetes	X
JD.com	JDOS	X
Spectro Cloud	Palette	X

Software Stack Matrix

Gang Scheduling Solutions

Solution	v1.33 Users	v1.34 Users	v1.35 Users	Total Stacks
Kueue	gardener, giantswarm, jdos, daocloud, cks, openshift (RH build)	aks, cks, gardener, gke, kubermatic, OKE, openshift (RH build), talos	cks, gke	18
Volcano	jdcloud	ovh, eks		3
Kai Scheduler	palette	eks		2
SUNK/Slurm	cks	cks	cks	3
Yunikorn		eks		1
LeaderWorkerSet		eks		1
AWS Batch		eks		1
Not specified	chinaunicom-csk, rke2, talos	ack, baidu_cce, lke, rke2, vks		7

Gateway / Ingress Solutions

Solution	v1.33 Users	v1.34 Users	v1.35 Users	Total
K-Gateway / GW API Inference Ext	cks	cks	cks, gke	4
Traefik	gardener, jdos, talos	gardener, talos		5
Istio	daocloud	aks, OKE, vks		4
KGateway / Kong	palette			1
GKE Gateway		gke	gke	2
KubeLB		kubermatic		1
AWS ALB Controller		eks		1
JD Cloud API GW	jdcloud			1
Cilium	talos	talos		2
Gateway API (impl unspecified)	chinaunicom-csk, giantswarm, openshift, rke2	ack, baidu_cce, lke, openshift, rke2		9

AI Operators / Frameworks

Operator	v1.33 Users	v1.34 Users	v1.35 Users	Total
KubeRay	gardener (v1.3.0), giantswarm (v1.0.0), jdos (v1.3.0), palette, talos (v1.4.2)	gardener, ovh (v1.5.1), talos (v1.4.2)		8
Kubeflow / Training Operator	openshift (Trainer V1), palette	aks, gke, kubermatic, OKE, openshift (Trainer V1)	gke	8
Ray (framework, not operator)	jdcloud	ack, cks, eks, gke, OKE	cks, gke	8
PyTorch Operator	jdcloud, openshift	openshift		3
vLLM		eks		1
AIBrix		eks		1
NVIDIA Triton		eks		1
KAITO (MS AI Toolchain)		aks		1
DeepSpeed	openshift	openshift		2
Not specified	chinaunicom-csk, cks, rke2	baidu_cce, cks, lke, rke2, vks	cks	8

GPU / Accelerator Stack

Component	v1.33 Users	v1.34 Users	v1.35 Users	Total
NVIDIA GPU Operator	daocloud, gardener, giantswarm (v1.0.1), openshift, palette (v25.3.4)	gardener, kubermatic, openshift, ovh		9
NVIDIA DCGM Exporter	gardener, jdcloud, jdos, openshift	eks, gardener, openshift, ovh		8
NVIDIA Device Plugin	talos (v0.14.5)	talos (v0.14.5)		2
NVIDIA DRA Driver	giantswarm (v25.3.0)	ovh		2
NVIDIA Container Toolkit	talos	talos		2
AMD GPU Operator (ROCm)	openshift	openshift		2
SUSE AI (integrated stack)	rke2	rke2		2
OCI GPU Plugin		OKE		1
Managed/unspecified GPU stack	chinaunicom-csk, cks	ack, aks, baidu_cce, cks, eks, gke, lke, vks	cks, gke	12

Cluster Autoscaling Solutions

Solution	v1.33 Users	v1.34 Users	v1.35 Users	Total
Kubernetes Cluster Autoscaler	chinaunicom-csk, gardener, giantswarm, rke2	gardener, OKE, ovh, rke2		8
Karpenter	giantswarm, palette	eks		3
Platform-native autoscaler	jdcloud	ack, aks, baidu_cce, cks, gke, kubermatic, lke, openshift, vks	cks, gke	12
N/A (bare metal/on-prem)	daocloud, talos	talos		3

Pod Autoscaling (HPA) Solutions

Solution	v1.33 Users	v1.34 Users	v1.35 Users	Total
KEDA	giantswarm (v3.1.0), openshift	aks, eks, openshift		5
prometheus-adapter	gardener, jdos	gardener, ovh		4
DCGM + custom metrics	gardener, jdos, palette	eks, gardener, ovh		6
Neuron Monitor (AWS)		eks		1
CronHPA	jdcloud			1
Standard HPA / metrics-server	chinaunicom-csk, cks, daocloud, rke2, talos	ack, baidu_cce, cks, gke, kubermatic, lke, OKE, rke2, talos, vks	cks, gke	16

Observability Stack

Component	v1.33 Users	v1.34 Users	v1.35 Users	Total
Prometheus	daocloud, gardener, jdcloud, jdos, openshift, palette	aks, eks, gardener, kubermatic, lke, OKE, openshift, ovh, vks		15
Grafana	cks, palette	cks, eks, lke	cks	6
OpenTelemetry	daocloud, jdcloud	eks (ADOT)		3
CloudWatch		eks		1
Azure Monitor		aks		1
GKE Observability		gke	gke	2

Container Runtime / CNI / OS (where specified)

Component	Submissions
OS: Ubuntu 22.04	palette (v1.33)
OS: Amazon Linux 2023	eks (v1.34)
OS: Bottlerocket	eks (v1.34)
OS: Talos Linux	talos (v1.33, v1.34)
OS: RHCOS	openshift (v1.33, v1.34)
OS: Photon OS	vks (v1.34)
CNI: Calico	palette (v3.30.3)
CNI: Cilium	talos (v1.33, v1.34)
CNI: Canal (Calico+Flannel)	rke2
Runtime: CRI-O	openshift
Runtime: containerd	rke2, talos, vks

Hardware (where specified)

GPU Model	Submissions
NVIDIA A100	giantswarm/v1.33 (p4d.24xlarge), ovh/v1.34 (MIG ref)
NVIDIA A10G	palette/v1.33
NVIDIA Tesla T4	giantswarm/v1.33, kubermatic/v1.34
NVIDIA Tesla V100	ovh/v1.34 (V100-PCIE-16GB)
NVIDIA Quadro P1000	talos/v1.33, talos/v1.34
Google TPU	gke/v1.34, gke/v1.35
AWS Trainium	eks/v1.34
AWS Inferentia	eks/v1.34
AMD GPUs (ROCm)	openshift/v1.33, openshift/v1.34

Open Source Project Usage Matrix

This matrix shows which open source projects are used across all 28 AI-conformant stacks.

Legend

M = Explicitly mentioned with version
X = Referenced/used (version not specified)
- = Not mentioned

v1.33 Submissions

Project	chinaunicom-csk	cks	daocloud	gardener	giantswarm	jdcloud	jdos	openshift	palette	rke2	talos
Kueue	-	X	X	M	M	-	M	X	-	-	M
Volcano	-	-	-	-	-	X	-	-	-	-	-
Kai Scheduler	-	-	-	-	-	-	-	-	X	-	-
KubeRay	-	-	-	M	M	-	M	-	X	-	M
Kubeflow	-	X	-	-	-	-	-	X	X	-	X
Ray	-	-	-	X	-	X	X	-	-	-	X
NVIDIA GPU Operator	-	-	X	X	M	-	X	X	M	-	-
NVIDIA DCGM Exporter	-	-	-	X	-	X	X	X	-	-	-
NVIDIA Device Plugin	-	-	-	-	-	-	-	-	-	-	M
NVIDIA DRA Driver	-	-	-	-	M	-	-	-	-	-	-
NVIDIA Container Toolkit	-	-	-	-	-	-	-	-	-	-	X
AMD GPU Operator	-	-	-	-	-	-	-	X	-	-	-
Prometheus	-	-	X	X	-	X	X	X	X	-	-
Grafana	-	X	-	-	-	X	-	-	X	-	-
OpenTelemetry	-	-	X	-	-	X	-	-	-	-	-
Istio	-	-	X	-	-	-	-	-	-	-	-
Traefik	-	-	-	X	-	-	X	-	-	-	M
Cilium	-	-	-	-	-	-	-	-	-	-	X
Calico	-	-	-	-	-	-	-	-	M	-	-
Gateway API	-	X	-	X	M	-	X	X	X	-	M
KEDA	-	-	-	-	M	-	-	X	-	-	-
Karpenter	-	-	-	-	X	-	-	-	X	-	-
K8s Cluster Autoscaler	X	-	-	X	X	-	-	-	X	X	-
metrics-server	-	X	-	-	-	-	-	-	-	-	-
prometheus-adapter	-	-	X	X	-	-	X	-	-	-	-
Flux	-	-	-	-	X	-	-	-	-	-	-
JobSet	-	-	-	-	M	-	-	-	-	-	-
SUNK/Slurm	-	X	-	-	-	-	-	-	-	-	-
DeepSpeed	-	-	-	-	-	-	-	X	-	-	-
Sonobuoy	-	-	X	-	X	-	-	-	-	-	-
SUSE AI	-	-	-	-	-	-	-	-	-	X	-
KGateway/Kong	-	-	-	-	-	-	-	-	X	-	-
PyTorch Operator	-	-	-	-	-	X	-	X	-	-	-

v1.34 Submissions

Project	ack	aks	baidu_cce	cks	eks	gardener	gke	kubermatic	lke	OKE	openshift	ovh	rke2	talos	vks
Kueue	-	X	-	X	-	M	X	X	-	X	X	-	-	M	-
Volcano	-	-	-	-	X	-	-	-	-	-	-	X	-	-	-
Kai Scheduler	-	-	-	-	X	-	-	-	-	-	-	-	-	-	-
KubeRay	-	-	-	-	-	X	-	-	-	-	-	M	-	M	-
Kubeflow	-	X	-	X	X	-	X	X	-	X	X	-	-	-	-
Ray	X	X	-	X	X	X	X	-	-	X	-	X	-	X	-
NVIDIA GPU Operator	-	-	-	-	-	X	-	X	-	-	X	X	-	-	-
NVIDIA DCGM Exporter	-	-	-	-	X	X	-	-	-	-	X	X	-	-	-
NVIDIA Device Plugin	-	-	-	-	-	-	-	-	-	X	-	-	-	M	-
NVIDIA DRA Driver	-	-	-	-	-	-	-	-	-	-	-	X	-	-	-
NVIDIA Container Toolkit	-	-	-	-	-	-	-	-	-	-	-	-	-	X	-
AMD GPU Operator	-	-	-	-	-	-	-	-	-	-	X	-	-	-	-
Prometheus	X	X	-	-	X	X	-	X	X	X	X	X	-	-	X
Grafana	-	-	-	X	X	-	-	-	X	-	-	-	-	-	-
OpenTelemetry	-	-	-	-	X	-	-	-	-	-	-	-	-	-	-
Istio	-	X	-	-	-	-	-	-	-	X	-	-	-	-	X
Traefik	-	-	-	-	-	X	-	-	-	-	-	-	-	X	-
Cilium	-	-	-	-	-	-	-	-	-	-	-	-	-	X	-
Gateway API	-	X	-	X	X	X	X	-	-	X	X	-	-	M	-
KEDA	-	X	-	-	X	-	-	-	-	-	X	-	-	-	-
Karpenter	-	-	-	-	X	-	-	-	-	-	-	-	-	-	-
K8s Cluster Autoscaler	-	-	-	-	-	X	-	X	-	X	-	X	X	-	-
metrics-server	-	-	-	X	-	-	-	-	-	-	-	-	-	-	-
prometheus-adapter	-	-	-	-	-	X	-	-	-	-	-	X	-	-	-
vLLM	-	-	-	-	X	-	-	-	-	-	-	-	-	-	-
AIBrix	-	-	-	-	X	-	-	-	-	-	-	-	-	-	-
NVIDIA Triton	-	-	-	-	X	-	-	-	-	-	-	-	-	-	-
KAITO	-	X	-	-	-	-	-	-	-	-	-	-	-	-	-
KubeLB	-	-	-	-	-	-	-	X	-	-	-	-	-	-	-
Yunikorn	-	-	-	-	X	-	-	-	-	-	-	-	-	-	-
LeaderWorkerSet	-	-	-	-	X	-	-	-	-	-	-	-	-	-	-
SUNK/Slurm	-	-	-	X	-	-	-	-	-	-	-	-	-	-	-
DeepSpeed	-	-	-	-	-	-	-	-	-	-	X	-	-	-	-
SUSE AI	-	-	-	-	-	-	-	-	-	-	-	-	X	-	-
PyTorch Operator	-	-	-	-	-	-	-	-	-	-	X	-	-	-	-

v1.35 Submissions

Project	cks	gke
Kueue	X	X
Kubeflow	X	X
Ray	X	X
K-Gateway	X	X
metrics-server	X	-
SUNK/Slurm	X	-
Grafana	X	-
Gateway API	-	X

Open Source Project Adoption Summary

Total unique stacks using each project across all 28 submissions:

#	Project	Stacks (of 28)	%	CNCF Status	Category
1	Kueue	18 / 28	64%	K8s SIG	Scheduling
2	Gateway API	17 / 28	61%	K8s SIG	Networking
3	Prometheus	15 / 28	54%	Graduated	Observability
4	Ray	14 / 28	50%	-	AI Framework
5	Kubeflow	13 / 28	46%	Incubating	AI Platform
6	NVIDIA GPU Operator	9 / 28	32%	-	Accelerator
7	KubeRay	8 / 28	29%	-	AI Operator
8	NVIDIA DCGM Exporter	8 / 28	29%	-	Observability
9	K8s Cluster Autoscaler	8 / 28	29%	K8s Core	Autoscaling
10	Grafana	6 / 28	21%	-	Observability
11	KEDA	5 / 28	18%	Graduated	Autoscaling
12	Traefik	5 / 28	18%	-	Networking
13	Istio	4 / 28	14%	Graduated	Networking
14	K-Gateway (GW API Inf. Ext)	4 / 28	14%	K8s SIG	Networking
15	prometheus-adapter	4 / 28	14%	K8s SIG	Observability
16	SUNK/Slurm	3 / 28	11%	- (Proprietary)	Scheduling
17	Karpenter	3 / 28	11%	K8s SIG	Autoscaling
18	Volcano	3 / 28	11%	Incubating	Scheduling
19	OpenTelemetry	3 / 28	11%	Incubating	Observability
20	PyTorch Operator	3 / 28	11%	-	AI Operator
21	metrics-server	3 / 28	11%	K8s SIG	Observability
22	DeepSpeed	2 / 28	7%	-	AI Framework
23	Cilium	2 / 28	7%	Graduated	Networking
24	NVIDIA Device Plugin	2 / 28	7%	-	Accelerator
25	NVIDIA DRA Driver	2 / 28	7%	-	Accelerator
26	NVIDIA Container Toolkit	2 / 28	7%	-	Accelerator
27	AMD GPU Operator	2 / 28	7%	-	Accelerator
28	Kai Scheduler	2 / 28	7%	Sandbox	Scheduling
29	SUSE AI	2 / 28	7%	-	AI Platform
30	Sonobuoy	2 / 28	7%	-	Testing
31	Flux	1 / 28	4%	Graduated	GitOps
32	Calico	1 / 28	4%	-	Networking
33	JobSet	1 / 28	4%	K8s SIG	Scheduling
34	KubeLB	1 / 28	4%	-	Networking
35	Yunikorn	1 / 28	4%	ASF	Scheduling
36	LeaderWorkerSet	1 / 28	4%	K8s SIG	Scheduling
37	vLLM	1 / 28	4%	-	AI Inference
38	AIBrix	1 / 28	4%	-	AI Inference
39	NVIDIA Triton	1 / 28	4%	-	AI Inference
40	KAITO	1 / 28	4%	-	AI Operator
41	KGateway/Kong	1 / 28	4%	-	Networking

Open Source Project Reference

Project	GitHub Repo	License	CNCF Status	Description
Kueue	kubernetes-sigs/kueue	Apache-2.0	K8s SIG	Kubernetes-native job queueing
Volcano	volcano-sh/volcano	Apache-2.0	Incubating	Batch scheduling for K8s
Kai Scheduler	kai-scheduler/KAI-Scheduler	Apache-2.0	Sandbox	GPU-optimized AI scheduler
KubeRay	ray-project/kuberay	Apache-2.0	-	Ray on Kubernetes operator
Kubeflow	kubeflow/kubeflow	Apache-2.0	Incubating	ML platform for K8s
Kubeflow Trainer	kubeflow/training-operator	Apache-2.0	Incubating	Distributed training operator
Ray	ray-project/ray	Apache-2.0	-	Distributed AI framework
vLLM	vllm-project/vllm	Apache-2.0	-	LLM inference engine
AIBrix	vllm-project/aibrix	Apache-2.0	-	GenAI inference components
DeepSpeed	microsoft/DeepSpeed	Apache-2.0	-	Distributed training library
NVIDIA GPU Operator	NVIDIA/gpu-operator	Apache-2.0	-	GPU lifecycle management
NVIDIA DCGM Exporter	NVIDIA/dcgm-exporter	Apache-2.0	-	GPU metrics for Prometheus
NVIDIA Device Plugin	NVIDIA/k8s-device-plugin	Apache-2.0	-	GPU device plugin for K8s
NVIDIA DRA Driver	NVIDIA/k8s-dra-driver	Apache-2.0	-	DRA driver for GPUs
NVIDIA Container Toolkit	NVIDIA/nvidia-container-toolkit	Apache-2.0	-	GPU container runtime
AMD GPU Operator	ROCm/gpu-operator	Apache-2.0	-	AMD GPU management
Prometheus	prometheus/prometheus	Apache-2.0	Graduated	Monitoring system
Grafana	grafana/grafana	AGPL-3.0	-	Observability platform
OpenTelemetry	open-telemetry/opentelemetry-collector	Apache-2.0	Incubating	Telemetry collection
Istio	istio/istio	Apache-2.0	Graduated	Service mesh
Traefik	traefik/traefik	MIT	-	Cloud-native proxy
Cilium	cilium/cilium	Apache-2.0	Graduated	eBPF networking
Calico	projectcalico/calico	Apache-2.0	-	K8s networking
KEDA	kedacore/keda	Apache-2.0	Graduated	Event-driven autoscaling
Karpenter	kubernetes-sigs/karpenter	Apache-2.0	K8s SIG	Node autoscaling
K8s Cluster Autoscaler	kubernetes/autoscaler	Apache-2.0	K8s Core	Cluster autoscaling
metrics-server	kubernetes-sigs/metrics-server	Apache-2.0	K8s SIG	Resource metrics
prometheus-adapter	kubernetes-sigs/prometheus-adapter	Apache-2.0	K8s SIG	Custom metrics API
Gateway API	kubernetes-sigs/gateway-api	Apache-2.0	K8s SIG	K8s networking API
K-Gateway	kubernetes-sigs/gateway-api-inference-extension	Apache-2.0	K8s SIG	AI inference gateway
Flux	fluxcd/flux2	Apache-2.0	Graduated	GitOps toolkit
JobSet	kubernetes-sigs/jobset	Apache-2.0	K8s SIG	Multi-job orchestration
LeaderWorkerSet	kubernetes-sigs/lws	Apache-2.0	K8s SIG	LLM inference sharding
KubeLB	kubermatic/kubelb	Apache-2.0	-	Centralized load balancing
Yunikorn	apache/yunikorn-core	Apache-2.0	ASF	Resource scheduler
KAITO	microsoft/kaito	MIT	-	AI toolchain operator
NVIDIA Triton	triton-inference-server/server	BSD-3-Clause	-	Inference server
Gardener	gardener/gardener	Apache-2.0	-	K8s cluster management
Talos Linux	siderolabs/talos	MPL-2.0	-	Minimal K8s OS
RKE2	rancher/rke2	Apache-2.0	-	Secure K8s distribution
Omni	siderolabs/omni	BSL-1.1	-	Talos cluster management
podinfo	stefanprodan/podinfo	Apache-2.0	-	K8s test microservice
CubeFS (fmr ContainerFS)	cubefs/cubefs	Apache-2.0	Graduated	Distributed storage

NVIDIA Ecosystem Summary

NVIDIA dominates the accelerator layer across the AI conformance program. Every single submission relies on NVIDIA technology in some form -- either directly via open-source NVIDIA projects or indirectly through managed cloud GPU services built on NVIDIA hardware.

NVIDIA Project Adoption Across All 28 Stacks

NVIDIA Project	GitHub Repo	Stacks (of 28)	% of 28	Versions Observed
NVIDIA GPU Operator	NVIDIA/gpu-operator	9	32%	v1.0.1 (Giant Swarm), v25.3.4 (Palette)
NVIDIA DCGM Exporter	NVIDIA/dcgm-exporter	8	29%	(versions not specified)
NVIDIA Device Plugin	NVIDIA/k8s-device-plugin	2	7%	v0.14.5 (Talos)
NVIDIA DRA Driver	NVIDIA/k8s-dra-driver	2	7%	v25.3.0 (Giant Swarm)
NVIDIA Container Toolkit	NVIDIA/nvidia-container-toolkit	2	7%	(version not specified)
NVIDIA Triton Inference Server	triton-inference-server/server	1	4%	(version not specified)
Kai Scheduler (originally NVIDIA)	kai-scheduler/KAI-Scheduler	2	7%	(version not specified)

NVIDIA Project Usage by Submission (detailed)

Submission	Version	GPU Operator	DCGM Exporter	Device Plugin	DRA Driver	Container Toolkit	Triton	Kai Scheduler	NVIDIA Projects Used
chinaunicom-csk	v1.33	-	-	-	-	-	-	-	0
cks	v1.33	-	-	-	-	-	-	-	0*
daocloud	v1.33	X	-	-	-	-	-	-	1
gardener	v1.33	X	X	-	-	-	-	-	2
giantswarm	v1.33	X	-	-	X	-	-	-	2
jdcloud	v1.33	-	X	-	-	-	-	-	1
jdos	v1.33	X	X	-	-	-	-	-	2
openshift	v1.33	X	X	-	-	-	-	-	2
palette	v1.33	X	-	-	-	-	-	X	2
rke2	v1.33	-	-	-	-	-	-	-	0**
talos	v1.33	-	-	X	-	X	-	-	2
ack	v1.34	-	-	-	-	-	-	-	0*
aks	v1.34	-	-	-	-	-	-	-	0*
baidu_cce	v1.34	-	-	-	-	-	-	-	0*
cks	v1.34	-	-	-	-	-	-	-	0*
eks	v1.34	-	X	-	-	-	X	X	3
gardener	v1.34	X	X	-	-	-	-	-	2
gke	v1.34	-	-	-	-	-	-	-	0*
kubermatic	v1.34	X	-	-	-	-	-	-	1
lke	v1.34	-	-	-	-	-	-	-	0*
OKE	v1.34	-	-	X	-	-	-	-	1
openshift	v1.34	X	X	-	-	-	-	-	2
ovh	v1.34	X	X	-	X	-	-	-	3
rke2	v1.34	-	-	-	-	-	-	-	0**
talos	v1.34	-	-	X	-	X	-	-	2
vks	v1.34	-	-	-	-	-	-	-	0*
cks	v1.35	-	-	-	-	-	-	-	0*
gke	v1.35	-	-	-	-	-	-	-	0*

* Managed cloud services use NVIDIA GPUs but don't disclose specific NVIDIA software components in their submissions. ** SUSE AI stack likely includes NVIDIA components but bundles them under its own umbrella.

NVIDIA Hardware Referenced in Submissions

GPU Model	Submissions	Notes
NVIDIA A100	giantswarm/v1.33 (AWS p4d.24xlarge), ovh/v1.34 (MIG reference)	High-end training/inference
NVIDIA A10G	palette/v1.33 (AWS)	Mid-range inference
NVIDIA Tesla T4	giantswarm/v1.33, kubermatic/v1.34	Inference-optimized
NVIDIA Tesla V100	ovh/v1.34 (V100-PCIE-16GB)	Training workhorse
NVIDIA Quadro P1000	talos/v1.33, talos/v1.34	Entry-level workstation

NVIDIA Aggregate Counts

Metric	Count
Total NVIDIA open-source projects used across all stacks	7
Stacks explicitly referencing at least 1 NVIDIA project	15 / 28 (54%)
Stacks using NVIDIA GPUs (explicit + managed cloud)	28 / 28 (100%)
Stacks using NVIDIA GPU Operator	9 (32%)
Stacks using NVIDIA DCGM Exporter	8 (29%)
Stacks using NVIDIA DRA Driver (new DRA path)	2 (7%)
Stacks using NVIDIA Device Plugin (legacy path)	2 (7%)
Stacks referencing NVIDIA Triton	1 (4%)
Stacks using Kai Scheduler (NVIDIA-originated)	2 (7%)
Unique NVIDIA GPU models documented	5 (A100, A10G, T4, V100, Quadro P1000)
Submissions with NVIDIA as only GPU vendor	26 / 28 (93%)
Submissions also supporting non-NVIDIA accelerators	3 (OpenShift: AMD, GKE: TPU, EKS: Trainium/Inferentia)

NVIDIA vs Non-NVIDIA Accelerator Support

Accelerator Ecosystem	Stacks Supporting	Vendors
NVIDIA GPUs only	25	All except OpenShift, GKE, EKS
NVIDIA + AMD (ROCm)	2	Red Hat (OpenShift v1.33, v1.34)
NVIDIA + Google TPU	2	Google (GKE v1.34, v1.35)
NVIDIA + AWS Trainium/Inferentia	1	AWS (EKS v1.34)

Key NVIDIA Takeaways

NVIDIA has 100% market penetration -- every single AI-conformant Kubernetes stack uses NVIDIA GPUs, making it the only universal hardware dependency in the program.
The GPU Operator + DCGM Exporter duo is the de facto standard for self-managed platforms (32% and 29% explicit adoption), while managed clouds abstract these away.
DRA Driver adoption is nascent -- only 2 stacks (Giant Swarm, OVH) explicitly use the new k8s-dra-driver, vs 2 still on the legacy Device Plugin. Most managed clouds haven't disclosed their DRA implementation details.
NVIDIA Triton appears in only 1 submission (EKS) despite being the leading inference server, suggesting most vendors treat inference serving as user-deployed rather than platform-provided.
Kai Scheduler, originally an NVIDIA project (now CNCF Sandbox), is used by 2 stacks (Palette, EKS), positioning NVIDIA in the scheduling layer as well.
Only 3 of 28 stacks support any non-NVIDIA accelerator -- OpenShift (AMD), GKE (TPU), and EKS (Trainium/Inferentia). This makes NVIDIA the single point of hardware dependency for 93% of conformant platforms.

Key Findings

1. Market Participation

28 unique submissions from 21 distinct vendors across 3 Kubernetes versions
5 vendors have submitted for multiple versions (CoreWeave leads with all 3)
v1.34 has the most submissions (15), suggesting the program gained traction after launch
v1.35 is early with only 2 submissions (CoreWeave and Google)

2. Dominant Open Source Stack

The "default" AI-conformant Kubernetes stack converges on:

Kueue for gang scheduling (18/28 stacks, 64%)
Prometheus for observability (15/28, 54%)
Gateway API for networking (17/28, 61%)
Ray/KubeRay for AI operators (14/28, 50%)
NVIDIA GPU Operator + DCGM for accelerator management (9+8/28)
Kubernetes Cluster Autoscaler or Karpenter for scaling

3. Scheduling Landscape

Kueue dominates gang scheduling with 64% adoption
Volcano is a distant second (3 stacks, primarily Chinese cloud providers + OVH)
AWS EKS stands out by supporting the most schedulers (Volcano, Kai, Yunikorn, LeaderWorkerSet, AWS Batch)

4. Accelerator Diversity

NVIDIA GPUs are universal -- every submission uses NVIDIA in some form
AMD GPUs (ROCm): Only Red Hat OpenShift
Google TPU: Only GKE
AWS Trainium/Inferentia: Only EKS
GPU sharing/MIG: Only GKE (v1.35) explicitly covers SHOULD-level requirements

5. CNCF Project Penetration

Among the most-used projects:

5 CNCF Graduated: Prometheus, KEDA, Istio, Cilium, Flux
3 CNCF Incubating: Kubeflow, Volcano, OpenTelemetry
1 CNCF Sandbox: Kai Scheduler
7 Kubernetes SIG projects: Kueue, Gateway API, Karpenter, Cluster Autoscaler, metrics-server, prometheus-adapter, K-Gateway
NVIDIA projects dominate the accelerator layer (not CNCF)

6. Notable Gaps

Most managed cloud services (ACK, AKS, CCE, GKE, EKS, OKE, LKE) don't specify internal stack details
Container runtime and CNI are rarely documented (only 5-6 submissions specify these)
Test artifacts (e2e logs) are rare -- only DaoCloud provides actual test output
The v1.35 SHOULD-level requirements (GPU sharing, vGPU, driver management) are only addressed by GKE

7. Vendor Differentiation

AWS EKS: Broadest accelerator and scheduler ecosystem (Trainium, Inferentia, 5 schedulers, vLLM, AIBrix, Triton)
Red Hat OpenShift: Only dual-GPU vendor (NVIDIA + AMD), most complete operator detail (Kubeflow Trainer V1)
CoreWeave CKS: Only vendor with all 3 versions, GPU-native cloud, proprietary SUNK scheduler
Google GKE: Only TPU support, only v1.35 submission covering all SHOULD requirements
Sidero Labs Talos: Only bare-metal-first OS submission, most transparent about hardware limitations

dims/2026-03-16-k8s-ai-conformance-analysis.md

CNCF Kubernetes AI Conformance - Full Analysis

Table of Contents

Overview

Submission Matrix by Version

v1.33 (11 submissions)

v1.34 (15 submissions)

v1.35 (2 submissions)

Cross-Version Presence

Software Stack Matrix

Gang Scheduling Solutions

Gateway / Ingress Solutions

AI Operators / Frameworks

GPU / Accelerator Stack

Cluster Autoscaling Solutions

Pod Autoscaling (HPA) Solutions

Observability Stack

Container Runtime / CNI / OS (where specified)

Hardware (where specified)

Open Source Project Usage Matrix

Legend

v1.33 Submissions

v1.34 Submissions

v1.35 Submissions

Open Source Project Adoption Summary

Open Source Project Reference

NVIDIA Ecosystem Summary

NVIDIA Project Adoption Across All 28 Stacks

NVIDIA Project Usage by Submission (detailed)

NVIDIA Hardware Referenced in Submissions

NVIDIA Aggregate Counts

NVIDIA vs Non-NVIDIA Accelerator Support

Key NVIDIA Takeaways

Key Findings

1. Market Participation

2. Dominant Open Source Stack

3. Scheduling Landscape

4. Accelerator Diversity

5. CNCF Project Penetration

6. Notable Gaps

7. Vendor Differentiation

Analysis Metadata

Submission Inventory at Time of Analysis

v1.33 (11 submissions)

v1.34 (15 submissions)

v1.35 (2 submissions)

Version Directories Present

Total Submissions: 28

Total Unique Vendors: 21

How to Detect Deltas