CI Coverage Map — sigs.k8s.io/dra-driver-nvidia-gpu

As of 2026-04-21. Sources: .github/workflows/, kubernetes/test-infra (config/jobs/kubernetes-sigs/dra-driver-nvidia-gpu/, config/testgrids/nvidia/nvidia.yaml), testgrid.k8s.io/nvidia-gpu, hack/ci/{gcp-nvkind,lambda,mock-nvml}, tests/bats/, test/e2e/.

TL;DR

3 execution surfaces: GitHub Actions (lint/unit/mock-e2e only), Prow on Lambda Cloud (real GPUs, BATS), Prow on GCP-nvkind (T4 GCE, Ginkgo).
7 Prow jobs on this repo: 3 e2e presubmits + 3 e2e periodics + 1 image-push postsubmit.
Only Lambda/arm64 (GH200) gives real arm64 GPU coverage. GCP-nvkind is amd64/T4 only.
Nothing is truly a required check. GitHub branch protection on main and release-25.8 lists EasyCLA as the only required status. No rulesets configured. Every CI signal above — GH Actions lint/unit/mock-e2e and all 4 Prow e2e presubmits (optional: true) — posts status but cannot block merge. Merge gating is effectively: EasyCLA + tide/OWNERS approval.
No CI ever runs tests-cd (ComputeDomain full suite on real NVLink fabric). Only tests-mock-nvml and tests-gpu-single are wired.
DynMIG is exercised on CI — test_gpu_dynmig.bats is in tests-gpu-single, and hack/ci/lambda/e2e-test.sh leaves DynMIG enabled on *h100*|*gh200*|*b200*. So every Lambda GH200 run does hit a dynamic-MIG path. Static MIG (test_gpu_mig.bats) still never runs in CI.
Lambda x86 jobs use GPU_TYPE="" → lambdactl watch picks cheapest-available-any-region. Last 10 periodic runs: 10/10 gpu_1x_a10. Last 10 presubmit runs: 5× A10, 2× A100 SXM4, 3× blocked on gpu_8x_v100_n quota-exceeded (non-retryable, contributes to 50% presubmit flake).

Table 1 — Master CI job matrix

Every distinct job/workflow that runs against this repo. Housekeeping bots (stale, cherry-pick, issue-triage) are excluded; see end of section.

Columns: "Gates" = Prow-level configuration only (e.g. always_run, optional, max_concurrency). No job in this table is a merge-required check — see TL;DR on branch protection.

#	Job / Workflow	Platform	Type	Trigger / Cadence	Provider	GPU	Arch	K8s	Suite / Target	TestGrid tab	Gates	Status 2026-04-21
1	`ci.yaml` → golang `check`	GH Actions	PR + push to main/release-*	every PR/push	—	none	amd64	—	`make golangci-lint`, generated-code check, `go mod` validate	—	—	—
2	`ci.yaml` → golang `test`	GH Actions	PR + push	every PR/push	—	none	amd64	—	`make test` (Go unit)	—	—	—
3	`ci.yaml` → golang `build`	GH Actions	PR + push	every PR/push	—	none	amd64	—	`make build`	—	—	—
4	`ci.yaml` → image	GH Actions	PR + push	every PR/push	—	none	amd64+arm64 (QEMU)	—	`make build` multi-arch OCI (no push)	—	—	—
5	`ci.yaml` → chart	GH Actions	PR + push	every PR/push	—	none	amd64	—	`helm lint` + package	—	—	—
6	`code_scanning.yaml`	GH Actions	called from basic-checks	every PR/push	—	none	amd64	—	CodeQL Go	—	—	—
7	`mock-nvml-e2e.yaml`	GH Actions	PR (paths-filtered) + push main	on-PR	mock-nvml (Kind + mocked NVML)	virtual 8×GB200	amd64	latest stable	BATS `tests-mock-nvml`	—	—	—
8	`tests.yaml`	GH Actions	workflow_dispatch only	manual	—	—	—	—	placeholder (echoes "bats runs on Prow")	—	—	noop
9	`pull-dra-driver-nvidia-gpu-e2e-lambda-gpu`	Prow	presubmit	every PR (skip release-*)	Lambda Cloud (kubeadm on bare metal)	`GPU_TYPE=""` → cheapest-available (see §GPU selection). Recent: A10 71%, A100 SXM4 29%	amd64	latest stable	BATS `tests-gpu-single`	`pull-dra-driver-nvidia-gpu-lambda`	`always_run:true`, `optional:true`, `max_concurrency:1`, 2h	FLAKY 50%
10	`pull-dra-driver-nvidia-gpu-e2e-lambda-gpu-gh200`	Prow	presubmit	every PR (skip release-*)	Lambda Cloud	GH200 (1×)	arm64	latest stable	BATS `tests-gpu-single`	`pull-dra-driver-nvidia-gpu-lambda-gh200`	`always_run:true`, `optional:true`, `max_concurrency:1`, 2h	FLAKY 50%
11	`pull-dra-driver-nvidia-gpu-e2e-gcp-nvkind`	Prow	presubmit	every PR (skip release-*)	GCP-nvkind (GCE VM → nvkind)	T4 (1×)	amd64	v1.34.3 (Ubuntu 22.04 DLVM)	Ginkgo `test/e2e/`	`pull-dra-driver-nvidia-gpu-gcp-nvkind`	`always_run:true`, `optional:true`, `max_concurrency:1`, 2h, Boskos `gpu-project`	PASSING
12	`ci-dra-driver-nvidia-gpu-e2e-lambda-gpu`	Prow	periodic	`interval: 6h`	Lambda Cloud	`GPU_TYPE=""` → cheapest-available. Recent 10/10: A10 @ us-east-1	amd64	latest stable	BATS `tests-gpu-single`	`ci-dra-driver-nvidia-gpu-lambda`	2h	PASSING 100%
13	`ci-dra-driver-nvidia-gpu-e2e-lambda-gpu-gh200`	Prow	periodic	`cron: 30 0,6,12,18 * * *` (6h, offset)	Lambda Cloud	GH200 (1×)	arm64	latest stable	BATS `tests-gpu-single`	`ci-dra-driver-nvidia-gpu-lambda-gh200`	2h	PASSING 100%
14	`ci-dra-driver-nvidia-gpu-e2e-gcp-nvkind`	Prow	periodic	`interval: 6h`	GCP-nvkind	T4 (1×)	amd64	v1.35.1 (Ubuntu 24.04 DLVM)	Ginkgo `test/e2e/`	`ci-dra-driver-nvidia-gpu-gcp-nvkind`	2h, Boskos `gpu-project`	FLAKY 70% (7/10 recent columns; live testgrid snapshot — numbers move)
15	`post-dra-driver-nvidia-gpu-push-images`	Prow	postsubmit	merge to `main`, `release-*`, SemVer tags	GCB image-builder	—	—	—	`run.sh` → push to `k8s-staging-images`	`sig-node-image-pushes, sig-k8s-infra-gcb`	trusted cluster	—

Excluded (housekeeping bots): cherrypick.yml, issue-triage.yml, stale.yml (daily cron 04:30 UTC).

Notes on the master table:

The Prow periodic GCP-nvkind pins v1.35.1 + Ubuntu 24.04 while the presubmit pins v1.34.3 + Ubuntu 22.04 — deliberate drift so periodics smoke-test newer k8s/OS.
The GH200 periodic uses a cron (30 0,6,12,18) instead of interval: to offset 3h from the sibling ci-kubernetes-e2e-lambda-device-plugin-gpu-gh200 and avoid GH200 capacity contention.
All Lambda jobs carry preset preset-lambda-credential → injects LAMBDA_API_KEY_FILE=/etc/lambda-cred/api-key.
All e2e jobs use the same container: us-central1-docker.pkg.dev/k8s-staging-test-infra/images/kubekins-e2e:v20260316-e86cefa561-master.

Table 2 — BATS test × SUITE selector matrix

Which .bats file is passed to bats under each make -f tests/bats/Makefile <target>. Transcribed directly from tests/bats/Makefile (tests-mock-nvml:187, tests-gpu-single:204, tests-gpu:214, tests-cd:225, tests:236). File-included is not the same as test-executed: tests-mock-nvml sets MOCK_NVML=true, under which several tests auto-skip (per-@test guards), and hack/ci/mock-nvml/e2e-test.sh also applies --filter-tags exclusions (!cuda-workload,!dynmig,!mig,!compute-domain,!multi-node,!gpu-busgrind,!version-specific).

BATS file	`tests` (full)	`tests-gpu`	`tests-gpu-single`	`tests-mock-nvml`	`tests-cd`	Hardware requirement
test_basics.bats	✓	✓	—	—	✓	none (sanity; expects GPU Operator)
test_gpu_basic.bats	✓	✓	✓	✓	—	any GPU
test_gpu_extres.bats	✓	✓	✓	✓	—	K8s ≥1.35 + DRAExtendedResource
test_gpu_robustness.bats	—	—	✓	✓	—	any GPU
test_gpu_stress.bats	✓	✓	—	✓	—	any GPU
test_gpu_updowngrade.bats	✓	✓	—	✓	—	prior-release image in registry
test_gpu_sharing.bats	—	—	✓	✓	—	any GPU (real MPS daemon for one case)
test_gpu_dynmig.bats	✓	✓	✓	—	—	MIG-capable GPU + `DynamicMIG=true`
test_gpu_mig.bats	✓	✓	—	—	—	MIG-capable (A100/H100/B200/GB200)
test_gpu_cuda_workloads.bats	—	—	✓	✓ (see note: 2 of 4 tests actually run under MOCK)	—	real CUDA compute (2 tests); other 2 just use ResourceClaimTemplate semantics
test_cd_imex_chan_inject.bats	✓	—	—	✓ (tests auto-skip on `MOCK_NVML=true`)	✓	IMEX daemon (Blackwell + drv ≥570.158.01)
test_cd_logging.bats	✓	—	—	✓ (auto-skip on MOCK)	✓	IMEX daemon
test_cd_misc.bats	✓	—	—	✓ (auto-skip on MOCK)	✓	IMEX daemon
test_cd_updowngrade.bats	✓	—	—	✓ (auto-skip on MOCK)	✓	IMEX daemon + prior-release image
test_cd_failover.bats	✓	—	—	✓ (auto-skip on MOCK)	✓	multi-node NVLink fabric (≥2 nodes, 4 GPU/node)
test_cd_mnnvl_workload.bats	✓	—	—	✓ (auto-skip on MOCK)	✓	multi-node NVLink fabric, real NCCL, MPI Operator
Files included	13/16	7/16	6/16	13/16	7/16
Invoked by CI	—	—	Lambda presubmit + periodic (both arch)	GH Actions `mock-nvml-e2e`	—

Takeaways:

tests is not "all 16 bats files" — it excludes robustness, sharing, and cuda_workloads.
tests-mock-nvml includes all 6 CD files and cuda_workloads, but:
- CD files: every @test in test_cd_*.bats starts with a MOCK_NVML skip guard (tests/bats/test_cd_imex_chan_inject.bats:17 etc.), so they contribute ~zero executed assertions on the mock runner.
- cuda_workloads: the mock-runner filter !cuda-workload is a no-op — no test in that file carries the cuda-workload tag (they're tagged gpu-workloads and fastfeedback). Of the 4 tests in test_gpu_cuda_workloads.bats: the CUDA-demo-suite test (line 31) and the busGrind test (line 118) skip via MOCK_NVML guards; the Job-with-ResourceClaimTemplate and Deployment-2-replicas tests (lines 52, 82) do NOT skip and actually execute on the mock runner. So mock-nvml does exercise RCT/deployment paths, just not real CUDA compute.
tests-gpu-single includes test_gpu_dynmig, so dynamic-MIG paths do get exercised in CI on GPUs the Lambda driver leaves unfiltered (H100 / GH200 / B200). Static MIG (test_gpu_mig) is only in tests / tests-gpu, neither of which is wired to CI.
The comment in hack/ci/mock-nvml/e2e-test.sh:377-379 ("We skip test_gpu_cuda_workloads.bats because it includes a CUDA demo suite test …") is stale — the file is actually included via tests-mock-nvml, and skipping happens per-test via MOCK_NVML guards, not at the file level.

Table 3 — Provider × arch × suite coverage slice

A compact view of what gets actually run where.

Provider	Caller	Arch	GPU model	Real CUDA?	DynMIG?	Static MIG?	IMEX / ComputeDomain?	Multi-GPU?	Multi-node?	Suite run
GH Actions runner + mock-nvml	GH Actions PR	amd64	8× virtual GB200	partial (non-tag tests run; one CUDA test auto-skips on MOCK)	✗ (filter `!dynmig`)	✗ (filter `!mig`)	files included, but every CD test auto-skips on MOCK_NVML=true	✓ (virtual 8×)	✗	`tests-mock-nvml` (13 files included, many skip at runtime)
Lambda (x86, A10)	Prow presubmit + 6h periodic	amd64	A10 (most common)	✓	✗ (A10 not MIG-capable → `!dynmig`)	✗	✗ (CD disabled unless `gb200	gb300	b200`)	✗
Lambda (x86, A100)	Prow presubmit + 6h periodic (when A10 unavailable)	amd64	A100 SXM4 40GB (1×)	✓	✗ (single-GPU A100 → `!dynmig` per `e2e-test.sh:112-120`)	✗	✗	✗	✗	`tests-gpu-single` (6 files)
Lambda (arm64)	Prow presubmit + 6h periodic	arm64	GH200 (1×)	✓ (no busGrind — arm64 apt limitation)	✓ (GH200 matches `gh200`, DynMIG enabled)	✗	✗ (CD only on `gb200	gb300	b200`)	✗
GCP-nvkind	Prow presubmit + 6h periodic	amd64	T4 (1×)	✓	✗ (T4 not MIG-capable)	✗	✗	✗	✗	Ginkgo `test/e2e/` (6 specs)

Table 4 — Other nvidia-gpu TestGrid tabs (context, not this repo)

The nvidia-gpu rollup on testgrid also displays 10 tabs from the NVIDIA device-plugin (k/k) program. Listed for context only — they do not test this driver but share the dashboard:

Tab	Job	Status
`ci-kubernetes-e2e-ec2-device-plugin-gpu`	periodic	FLAKY 80%
`ci-lambda-device-plugin-gpu`	periodic	PASSING
`ci-lambda-device-plugin-gpu-gh200`	periodic	FLAKY 70%
`gce-device-plugin-gpu-{1.33,1.34,1.35,1.36,master}`	periodic	PASSING / master FLAKY 90%
`pull-kubernetes-e2e-ec2-device-plugin-gpu`	presubmit	STALE (last run 2026-03-18)
`pull-lambda-device-plugin-gpu`	presubmit	FLAKY 50%

How `GPU_TYPE=""` actually resolves on Lambda

The two Prow jobs ci-dra-driver-nvidia-gpu-e2e-lambda-gpu and pull-dra-driver-nvidia-gpu-e2e-lambda-gpu pass GPU_TYPE="". The resolution happens in two layers:

Layer 1 — experiment/lambda/lib/lambda-common.sh (test-infra):

LAMBDA_GPU_TYPE="${GPU_TYPE-gpu_1x_a10}"   # '-' not ':-'  → empty stays empty
...
if [ -n "${LAMBDA_GPU_TYPE}" ]; then
  gpu_args=(--gpu "${LAMBDA_GPU_TYPE}")
fi
lambdactl --json watch "${gpu_args[@]}" --ssh ... --interval 30 --timeout 900 --wait-ssh

When empty, --gpu is omitted entirely — no filter, no region pin.

Layer 2 — lambdactl watch (dims/lambdactl, cmd/watch.go):

Poll lambdactl types every 30s for up to 900s.
Keep types with at least one region currently showing availability.
Sort by PriceCents ascending, pick candidates[0].
Launch into Regions[0] of that type.
On a retryable capacity error → continue the loop and re-poll. On a quota error → hard-fail (not retryable).

After the launch returns, the script overwrites LAMBDA_GPU_TYPE with the actual provisioned type so BATS capability gating works on what really got allocated.

Lambda instance catalog (from `lambdactl types`, snapshot 2026-04-21)

Cheapest-first, so first-available is what gets picked:

Rank	SKU	$/hr	GPU	Arch	Current avail
1	`gpu_1x_a10`	$1.29	A10 24GB PCIe	x86	1 region
2	`gpu_1x_a100_sxm4`	$1.99	A100 40GB SXM4	x86	2 regions
3	`gpu_2x_a6000`	$2.18	2×A6000 48GB	x86	0
4	`gpu_1x_gh200`	$2.29	GH200 96GB	arm64	1 region
5	`gpu_1x_h100_pcie`	$3.29	H100 80GB PCIe	x86	1 region
6	`gpu_1x_h100_sxm5`	$4.29	H100 80GB SXM5	x86	1 region
7	`gpu_8x_v100_n`	$6.32	8×V100 16GB	x86	1 region
8	`gpu_1x_b200_sxm6`	$6.99	B200 180GB SXM6	x86	1 region
—	(heavier SKUs)	$8.38–$53.52	2×/4×/8× H100/B200/A100	x86	0

Actual SKUs landed — last 10 runs (as of 2026-04-21)

Periodic ci-dra-driver-nvidia-gpu-e2e-lambda-gpu:

gpu_1x_a10 @ us-east-1   ##########   10/10 (100%)

All ten runs, A10 @ us-east-1. The cheapest SKU has been consistently available during periodic windows.

Presubmit pull-dra-driver-nvidia-gpu-e2e-lambda-gpu (last 10 attempts):

Actually launched (7/10):
  gpu_1x_a10         @ us-east-1   #####   5
  gpu_1x_a100_sxm4   @ us-east-1/us-west-2  ##   2

Pre-launch quota-failed (3/10):
  gpu_8x_v100_n      @ us-south-2   ###   3   ← hard fail, no retry

Three consecutive presubmit failures on 2026-04-18 all hit the same trap: Lambda advertised gpu_8x_v100_n@us-south-2 as available (cheapest-with-capacity at that moment), lambdactl raced to launch it, and the account returned Quota exceeded, which lambdactl treats as non-retryable. This is a real contributor to the 50% flake on the presubmit tab.

One clean example of the capacity-retry path (build 2045306201683529728): gpu_1x_a10 @ us-west-1 hit "Not enough capacity" three times, then gpu_1x_a100_sxm4 @ us-east-1 became cheapest-available on the next poll and launched.

Implications

"Lambda x86" ≠ A10. It is A10 most of the time, A100 SXM4 when the A10 pool is tight, and could be any cheaper-than-GH200 SKU if Lambda lowers prices or empties upper pools.
MIG never fires even when A100 lands — the job invokes tests-gpu-single, which excludes test_gpu_mig and test_gpu_dynmig. So the rare A100 runs are wasted for MIG coverage.
Quota-exceeded on gpu_8x_v100_n is a latent bug. Either the test-infra account gets its V100-8x quota raised, or lambdactl watch needs to learn to treat quota errors as retryable (with a short deny-list for that poll-loop iteration).

Gap analysis — what is missing for this repo

GPU hardware coverage

In practice CI lands on: T4 (GCP-nvkind), A10 (Lambda x86, dominant), A100 SXM4 40GB (Lambda x86 fallback, occasional), GH200 (Lambda arm64). Everything else Lambda advertises (H100 PCIe/SXM5, B200 SXM6, V100) is cheaper-than-GH200 so could be selected, but in the last ~20 runs the cheaper SKUs (A10, A100) always won.
Only tests-mock-nvml exercises GB200/B200 profiles — all synthetic. On this runner: every CD test skips at runtime (MOCK_NVML guards), and 2 of 4 cuda_workloads tests skip; the remaining 2 (RCT + 2-replica Deployment) do execute.
Static MIG (test_gpu_mig.bats) never runs in CI — only appears in tests/tests-gpu, neither of which is invoked.
Dynamic MIG (test_gpu_dynmig.bats) runs only on Lambda GH200 — it's in tests-gpu-single, but hack/ci/lambda/e2e-test.sh filters it out except for *h100*|*gh200*|*b200*. The x86 presubmit/periodic (A10, single-GPU A100) always filter !dynmig. If Lambda ever lands H100 PCIe or B200 on the x86 job, those would also exercise DynMIG.

Test-suite coverage

tests-cd (full ComputeDomain suite) is not run in any CI — the failover, logging, misc, multi-node workload, and CD-updowngrade tests only run locally or via manual /test overrides if someone wires it up.
tests-gpu (full GPU suite, includes MIG / stress / updowngrade) is not run in any CI — Lambda jobs use the -single subset.
MIG paths (test_gpu_mig, test_gpu_dynmig) never execute in CI because no MIG-capable GPU is wired.
Real-CUDA bats tests (the CUDA-demo-suite and busGrind tests inside test_gpu_cuda_workloads.bats) only run on Lambda (A10/A100/GH200). The other two tests in the same file (RCT + 2-replica Deployment) also run on mock-nvml. Nothing in GCP-nvkind exercises this file at all — it runs Ginkgo test/e2e/, not BATS.

Architecture coverage

arm64 is covered only by Lambda GH200. mock-nvml-e2e.yaml runs on ubuntu-latest (amd64) with a multi-arch buildx image but the runtime is amd64.
GCP-nvkind is hard-coded amd64 (linux-amd64 download in setup-nvkind-node.sh).

Kubernetes version coverage

GCP-nvkind periodic alone smoke-tests v1.35.1. Lambda uses "latest stable" unpinned.
Release branches (release-*): all four Prow presubmits have skip_branches: [release-\d+\.\d+], so release branches get no e2e presubmit gating. Periodics are main-only (extra_refs: ...@main). Release branches only get GH-Actions lint/unit/mock-nvml.

Multi-node / NVLink fabric

No CI runs multi-node. All ComputeDomain failover/MNNVL tests require ≥2 nodes with 4 GPUs each — nothing in hack/ci/* provisions that topology.

Optionality / blocking

All 3 Prow e2e presubmits (pull-*-lambda-gpu, pull-*-lambda-gpu-gh200, pull-*-gcp-nvkind) are optional: true — post status but cannot block.
GitHub branch protection on main and release-25.8 lists EasyCLA as the only required status check (verified via gh api); rulesets are empty. That means no GH-Actions job (not lint, not unit, not image, not mock-nvml-e2e) is a required check either. A PR can merge with every CI job red as long as EasyCLA is green and tide/OWNERS approval lands. Effective merge gates: EasyCLA + LGTM/approval.

Stability (live testgrid snapshot 2026-04-21; these numbers move — re-check via curl -s https://testgrid.k8s.io/nvidia-dra/summary)

ci-dra-driver-nvidia-gpu-gcp-nvkind periodic: FLAKY 70% (7/10 recent columns).
pull-dra-driver-nvidia-gpu-lambda presubmit: FLAKY 50% (5/10).
pull-dra-driver-nvidia-gpu-lambda-gh200 presubmit: FLAKY 50% (1/2 — very low sample).
All three Lambda periodics: PASSING 100% recent.
With optional: true and chronic presubmit flake, signal is weak.
No testgrid-alert-email on any DRA-driver tab. Failures do not page anyone.
gpu_8x_v100_n quota-exceeded: Lambda account advertises capacity for an SKU it has no quota for; lambdactl watch treats quota errors as non-retryable and hard-fails. Three of the last ten presubmit attempts died this way. Fix options: (a) raise the V100-8x quota, (b) make quota errors retryable with a per-poll deny-list, or (c) set an explicit allow-list on the Prow job (e.g., GPU_TYPE=gpu_1x_a10,gpu_1x_a100_sxm4,gpu_1x_h100_pcie) so V100-8x is never considered.

Secrets / credential surface

Lambda API key (k8s secret lambda-ai-api-key) + Boskos-leased GCP project. Both relatively narrow — consistent with "off-cluster heavy lifting" pattern (no DinD or privileged on the Prow pod).

References (raw URLs)

Prow jobs

Testgrid

https://testgrid.k8s.io/nvidia-gpu (rollup — 16 tabs)
https://testgrid.k8s.io/nvidia-dra (repo-scoped — 6 tabs)
https://testgrid.k8s.io/nvidia-arm64 (3 tabs, GH200 only)
https://testgrid.k8s.io/nvidia-presubmits
https://testgrid.k8s.io/nvidia-periodics
https://github.com/kubernetes/test-infra/blob/master/config/testgrids/nvidia/nvidia.yaml

Repo

.github/workflows/mock-nvml-e2e.yaml
hack/ci/{lambda,gcp-nvkind,mock-nvml}/e2e-test.sh
tests/bats/Makefile (SUITE selectors)
test/e2e/ (Ginkgo)

dims/dra-driver-nvidia-gpu-ci-coverage.md

Select an option

No results found

Select an option

No results found

CI Coverage Map — sigs.k8s.io/dra-driver-nvidia-gpu

TL;DR

Table 1 — Master CI job matrix

Table 2 — BATS test × SUITE selector matrix

Table 3 — Provider × arch × suite coverage slice

Table 4 — Other nvidia-gpu TestGrid tabs (context, not this repo)

How `GPU_TYPE=""` actually resolves on Lambda

Lambda instance catalog (from `lambdactl types`, snapshot 2026-04-21)

Actual SKUs landed — last 10 runs (as of 2026-04-21)

Implications

Gap analysis — what is missing for this repo

References (raw URLs)

dims/dra-driver-nvidia-gpu-ci-coverage.md

CI Coverage Map — sigs.k8s.io/dra-driver-nvidia-gpu

TL;DR

Table 1 — Master CI job matrix

Table 2 — BATS test × SUITE selector matrix

Table 3 — Provider × arch × suite coverage slice

Table 4 — Other nvidia-gpu TestGrid tabs (context, not this repo)

How GPU_TYPE="" actually resolves on Lambda

Lambda instance catalog (from lambdactl types, snapshot 2026-04-21)

Actual SKUs landed — last 10 runs (as of 2026-04-21)

Implications

Gap analysis — what is missing for this repo

References (raw URLs)

How `GPU_TYPE=""` actually resolves on Lambda

Lambda instance catalog (from `lambdactl types`, snapshot 2026-04-21)