Skip to content

Instantly share code, notes, and snippets.

@dims
dims / dra-driver-nvidia-gpu-ci-coverage.md
Created April 21, 2026 17:04
CI Coverage Map — sigs.k8s.io/dra-driver-nvidia-gpu (Lambda/GCP-nvkind/mock-nvml providers, BATS suites, TestGrid tabs, GPU_TYPE= resolution, gap analysis)

CI Coverage Map — sigs.k8s.io/dra-driver-nvidia-gpu

As of 2026-04-21. Sources: .github/workflows/, kubernetes/test-infra (config/jobs/kubernetes-sigs/dra-driver-nvidia-gpu/, config/testgrids/nvidia/nvidia.yaml), testgrid.k8s.io/nvidia-gpu, hack/ci/{gcp-nvkind,lambda,mock-nvml}, tests/bats/, test/e2e/.

TL;DR

  • 3 execution surfaces: GitHub Actions (lint/unit/mock-e2e only), Prow on Lambda Cloud (real GPUs, BATS), Prow on GCP-nvkind (T4 GCE, Ginkgo).
  • 7 Prow jobs on this repo: 3 e2e presubmits + 3 e2e periodics + 1 image-push postsubmit.
  • Only Lambda/arm64 (GH200) gives real arm64 GPU coverage. GCP-nvkind is amd64/T4 only.
  • Nothing is truly a required check. GitHub branch protection on main and release-25.8 lists EasyCLA as the only required status. No rulesets configured. Every CI signal above — GH Actions lint/unit/mock-e2e and all 4 Prow e2e presubmits (optional: true) — posts status but cannot block merge. Merge gating is effectively: EasyCLA + tide/OW
@dims
dims / mock-nvml-bats-test-analysis.md
Last active April 16, 2026 15:09
Mock NVML GB200 Emulation: Deep-dive, BATS Test Analysis, and Test Results

Mock NVML BATS Test Compatibility Analysis

Date: 2026-04-15 Environment: CPU-only Kind cluster, 8x mock GB200 NVL, driver 570.170.01 Branch: worktree-mock-nvml-gb200-ci-v2

Environment Constraints

  • nvidia-smi works, shows 8x NVIDIA GB200 NVL with correct attributes
  • NVML queries work (name, UUID, memory, architecture, compute capability)
@dims
dims / 2026-04-12-lambda-gpu-test-roadmap-v2.md
Created April 12, 2026 20:41
Lambda Cloud GPU Test Coverage: What's Next (v2 roadmap)

Lambda Cloud GPU Test Coverage: What's Next

Date: 2026-04-12 Scope: Forward-looking roadmap for expanding DRA GPU driver test coverage on Lambda Cloud. Covers only what remains to be done — not what's already landed or in flight.

Prerequisite: PRs #1025, #1027, #1028 should be merged first. After they land, Lambda CI runs 25 tests across 6 test files covering basic GPU allocation, CUDA workloads, Dynamic MIG, TimeSlicing, MPS, DRAExtendedResource, Prometheus metrics, CEL selectors, claim lifecycle, and robustness.


1. Zero-Code Wins: Add Existing Tests to Lambda CI

@dims
dims / 2026-04-11-lambda-gpu-test-coverage-roadmap.md
Last active April 12, 2026 19:41
Lambda Cloud GPU Test Coverage Roadmap for dra-driver-nvidia-gpu - comprehensive analysis of testable features, QA plan comparison, and implementation phases

Lambda Cloud GPU Test Coverage Roadmap for dra-driver-nvidia-gpu

Date: 2026-04-11 Scope: Comprehensive analysis of what features of the DRA driver can be tested on Lambda Cloud, what we already cover, what's feasible to add, and what's out of reach.


PR Tracking

| PR | Repo | Status | Description |

@dims
dims / lambda-gpu-testing-guide.md
Created April 11, 2026 11:25
Running DRA GPU Tests on Lambda Cloud (without Prow) - step by step guide

Running DRA GPU Tests on Lambda Cloud (Without Prow)

This guide walks you through running the nvidia DRA driver GPU tests on a Lambda Cloud GPU instance, the same way our CI does it — but from your laptop.

Prerequisites

  1. Lambda Cloud API key — sign up at lambdalabs.com, go to Settings > API Keys, create one. Set it:
    mkdir -p ~/.lambda

echo "YOUR_API_KEY_HERE" > ~/.lambda/api-key

@dims
dims / toc.md
Last active April 15, 2026 16:28
TOC Member Qualifications & Criteria

TOC Member Qualifications & Criteria

Charter-Defined Qualifications (Charter 6(d))

Nominees must:

  1. Bandwidth — commit available time to invest in the CNCF TOC
  2. Technical Expertise — demonstrate an advanced level of professional experience as leaders in the scope of CNCF
  3. Seniority — demonstrate seniority sufficient to access staff/community resources to assist in TOC preparations
  4. Neutrality — operate neutrally in discussions and balance CNCF goals with corporate objectives or any particular project
@dims
dims / toc-members-timeline.svg
Last active April 9, 2026 02:45
CNCF TOC Members Timeline
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dims
dims / k8s-e2e-node-lambda-cloud.md
Created April 2, 2026 02:41
Running Kubernetes GPU e2e Tests on Lambda Cloud

Running Kubernetes GPU e2e Tests on Lambda Cloud

Step-by-step guide to launch a Lambda Cloud GPU instance, set up a single-node Kubernetes cluster with NVIDIA GPU support, and run GPU e2e tests like [sig-node] [Feature:GPUDevicePlugin] [Serial] Test using a Job should run gpu based jobs.

Prerequisites (on your Mac)

  1. lambdactl CLI at ~/go/src/github.com/dims/lambdactl/bin/lambdactl
  2. Lambda API key stored at ~/.config/lambda/.key (or set LAMBDA_API_KEY env var)
  3. SSH key registered with Lambda (see step 1b below)
@dims
dims / 2026-03-16-k8s-ai-conformance-analysis.md
Last active March 16, 2026 22:20
CNCF K8s AI Conformance Analysis - 2026-03-16 (SHA: 223d15f)

CNCF Kubernetes AI Conformance - Full Analysis

Date: 2026-03-16 Repository: github.com/cncf/k8s-ai-conformance Commit SHA: 223d15f97434ea478f1440d73901435d16503682 Branch: main


Table of Contents

@dims
dims / nvsentinel-external-contributors.md
Last active March 2, 2026 15:51
nvsentinel-external-contributors.md

nvsentinel — External Contributor Report

Generated: 2026-03-02 Repo: nvidia/nvsentinel Total commits analyzed: 380 Methodology: Extracted all unique commit authors → checked email domains → verified GitHub handles against GET /orgs/NVIDIA/members/{username} (HTTP 204 = confirmed member, 404 = not a member) → cross-referenced public GitHub profiles and LinkedIn → checked every commit for DCO Signed-off-by trailer.


DCO Status: All External Commits Are Signed ✅