Skip to content

Instantly share code, notes, and snippets.

apiVersion: batch/v1
kind: CronJob
metadata:
name: example
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
spec:
activeDeadlineSeconds: 600
# v1.21
$ kind create cluster --image kindest/node:v1.21.1
Creating cluster "kind" ...
⢎⡠ Ensuring node image (kindest/node:v1.21.1) 🖼 ^R[B
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
# v1.23
$ kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.23.4) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
gengwg@gengwg-mbp:~$ kind create cluster --image kindest/node:v1.23.5
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.23.5) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
engwg@gengwg-mbp:~$ kind create cluster --image kindest/node:v1.23.6
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.23.6) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
gengwg@gengwg-mbp:~/go$ ls
pkg
gengwg@gengwg-mbp:~/go$ g [email protected]:kubernetes/kubernetes.git
Cloning into 'kubernetes'...
Enter passphrase for key '/Users/gengwg/.ssh/id_rsa':
remote: Enumerating objects: 1324661, done.
remote: Counting objects: 100% (174/174), done.
remote: Compressing objects: 100% (113/113), done.
remote: Total 1324661 (delta 80), reused 61 (delta 61), pack-reused 1324487
Receiving objects: 100% (1324661/1324661), 834.92 MiB | 12.02 MiB/s, done.
# v1.22.9
## build image
gengwg@gengwg-mbp:~$ cd go/src/k8s.io/kubernetes/
gengwg@gengwg-mbp:~/go/src/k8s.io/kubernetes$ git checkout v1.22.9
Updating files: 100% (6336/6336), done.
Previous HEAD position was ad3338546da Release commit for Kubernetes v1.23.6
HEAD is now at 6df4433e288 Release commit for Kubernetes v1.22.9
@gengwg
gengwg / nvml_cgroupv2_fix.md
Last active July 10, 2024 07:10
Fix jobs originally seeing the GPUs fine, suddenly nvml goes away after a few hours

NOTE: This seems fixed our cluster. BUT I do see some still reporting cgroup2 having same issue, for example here. So YMMV.

DISCLAIMER: This seems works in our env. may not work in others. I'm still not sure what is the real root cause(s) yet. Not even 100% sure it full fixes in our env - it's been good for 2 weeks. But if it reappears, (for example, under certain use cases. high load or something), I'll be doomed.

TLDR

Switching to cgroup v2 seems fixed the nvml suddenly go away in pod issue.

Problem

@gengwg
gengwg / debian_unable_mount_rootfs.md
Created November 25, 2022 00:15
not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Problem

Similar to

wn-block(0,0)
[    0.667378] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.47-1-MANJARO #1
[    0.667435] Hardware name: Acer Aspire E5-575G/Ironman_SK  , BIOS V1.04 04/26/2016
[    0.667493]  ffffc90000c8bde0 ffffffff813151d2 ffff880276a77000 ffffffff8190b950
[    0.667717]  ffffc90000c8be68 ffffffff8117ecd4 ffffffff00000010 ffffc90000c8be78
@gengwg
gengwg / dnf-reposync.md
Created January 21, 2023 01:06
Using DNF to Download/Sync with Local Repo

Using DNF to Download/Sync with Local Repo

Command:

# download to current repo
$ dnf reposync --repoid=windscribe --download-metadata -p .
Windscribe                                                                                                                                                     4.6 kB/s | 2.9 kB     00:00    
Windscribe                                                                                                                                                     8.2 kB/s |  11 kB     00:01