Skip to content

Instantly share code, notes, and snippets.

@PatrickLang
Last active July 10, 2019 00:41
Show Gist options
  • Save PatrickLang/a92766f23d39b442e5c8d92cb4ed05f2 to your computer and use it in GitHub Desktop.
Save PatrickLang/a92766f23d39b442e5c8d92cb4ed05f2 to your computer and use it in GitHub Desktop.
Bugs in unit parsing

Background

https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

Limits and requests for memory are measured in bytes. You can express memory as a plain integer or as a fixed-point integer using one of these suffixes: E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. For example, the following represent roughly the same value:

128974848, 129e6, 129M, 123Mi

Bug

If someone uses a lowercase letter, for example m, that has inconsistent behavior between Windows & Linux. m is a valid suffix for CPU but not memory.

Note: docker command line options do support m, b, k and g. docker resource constraints

On Linux, the m prefix is dropped if it's included as part of a memory resource. This resource allocation will create an allocation of 300 bytes, not 300 Megabytes or Mebibytes. That's too small and container startup will fail. See full error logs below.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-test1
spec:
  containers:
  - name: nginx
    image: library/nginx
    resources:
      limits:
        cpu: 1
        memory: 800m
    ports:
      - containerPort: 80
  nodeSelector:
    "beta.kubernetes.io/os": linux

On Windows, the m prefix is treated the same as M. This resource allocation will be 300 Megabytes, and container startup will succeed.

apiVersion: v1
kind: Pod
metadata:
  name: iis-test2
spec:
  containers:
  - name: iis
    image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
    resources:
      limits:
        cpu: 1
        memory: 800m
    ports:
      - containerPort: 80
  nodeSelector:
    "beta.kubernetes.io/os": windows

This would have the same effect on resources as docker run --memory 800m -d mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019 on Windows.

Proposed Fixes

There are a few different places this could be fixed. They're not mutually exclusive, so I'm asking SIG-Node and API reviewers to consider the impact of each.

Parsing cpu & memory limits separately is not feasible as the same string parser is shared for all resource types.

Fix 1 - reject memory limits below a certain threshold in the admission controller

This would be the most consistent as the failure would happen immediately, and be consistent on both Windows & Linux. Let's start with a strawman proposal of 10Mi as the container minimum. Windows actually needs more like 200Mi, but I think 10Mi would be enough to catch the most egregious mistakes such as "10m".

Source Locations:

https://github.com/kubernetes/kubernetes/blob/f47da8a75d13590322a4e1df502cf6629b2c36fa/plugin/pkg/admission/limitranger/admission.go#L213

Fix 2 - reject memory limits with m in kubelet

This would fail later, but won't involve changes to the apiserver, common APIs or admission controller. The error messages can still be made consistent between Linux & Windows, but different minimums can be enforced in each side. This could be implemented in the code that creates the container configs before passing them to CRI.

Source locations:

Linux: https://github.com/kubernetes/kubernetes/blob/33081c1f07be128a89441a39c467b7ea2221b23d/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go#L38 Windows: https://github.com/kubernetes/kubernetes/blob/b39d8f4777b3a01359046d05b1de17298b818f2f/pkg/kubelet/kuberuntime/kuberuntime_container_windows.go#L34

Full error logs

Here's what the status looks like for the failed pod:

$ kubectl get pod
NAME                                       READY   STATUS              RESTARTS   AGE
iis-test1                                  1/1     Running             0          10m
iis-test2                                  1/1     Running             0          10m
nginx-test1                                1/1     Running             0          10m
nginx-test2                                0/1     ContainerCreating   0          10m

patrick@planglx1:~/gist/a92766f23d39b442e5c8d92cb4ed05f2$ kubectl describe pod nginx-test2
Name:               nginx-test2
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               aks-nodepool1-20293662-vmss000000/10.240.0.4
Start Time:         Tue, 09 Jul 2019 23:44:09 +0000
Labels:             <none>
Annotations:        <none>
Status:             Pending
IP:                 
Containers:
  nginx:
    Container ID:   
    Image:          library/nginx
    Image ID:       
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  800m
    Requests:
      cpu:        100m
      memory:     300m
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jrvwx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-jrvwx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-jrvwx
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                    From                                        Message
  ----     ------                  ----                   ----                                        -------
  Normal   Scheduled               10m                    default-scheduler                           Successfully assigned default/nginx-test2 to aks-nodepool1-20293662-vmss000000
  Warning  FailedCreatePodSandBox  10m                    kubelet, aks-nodepool1-20293662-vmss000000  Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "nginx-test2": Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown
  Warning  FailedCreatePodSandBox  10m                    kubelet, aks-nodepool1-20293662-vmss000000  Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "nginx-test2": Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:297: getting the final child's pid from pipe caused \"read init-p: connection reset by peer\"": unknown
  Warning  FailedCreatePodSandBox  5m52s (x230 over 10m)  kubelet, aks-nodepool1-20293662-vmss000000  Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "nginx-test2": Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:297: getting the final child's pid from pipe caused \"EOF\"": unknown
  Normal   SandboxChanged          52s (x538 over 10m)    kubelet, aks-nodepool1-20293662-vmss000000  Pod sandbox changed, it will be killed and re-created.
apiVersion: v1
kind: Pod
metadata:
name: iis-test2
spec:
containers:
- name: iis
image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
resources:
limits:
cpu: 1
memory: 800m
requests:
cpu: .1
memory: 300m
ports:
- containerPort: 80
nodeSelector:
"beta.kubernetes.io/os": windows
apiVersion: v1
kind: Pod
metadata:
name: iis-test1
spec:
containers:
- name: iis
image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
resources:
limits:
cpu: 1
memory: 800Mi
requests:
cpu: .1
memory: 300Mi
ports:
- containerPort: 80
nodeSelector:
"beta.kubernetes.io/os": windows
apiVersion: v1
kind: Pod
metadata:
name: nginx-test2
spec:
containers:
- name: nginx
image: library/nginx
resources:
limits:
cpu: 1
memory: 800m
requests:
cpu: .1
memory: 300m
ports:
- containerPort: 80
nodeSelector:
"beta.kubernetes.io/os": linux
apiVersion: v1
kind: Pod
metadata:
name: nginx-test1
spec:
containers:
- name: nginx
image: library/nginx
resources:
limits:
cpu: 1
memory: 800Mi
requests:
cpu: .1
memory: 300Mi
ports:
- containerPort: 80
nodeSelector:
"beta.kubernetes.io/os": linux
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment