Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save LionelJouin/5cfc11eecf73663b5657ed3be1eb6c00 to your computer and use it in GitHub Desktop.
Save LionelJouin/5cfc11eecf73663b5657ed3be1eb6c00 to your computer and use it in GitHub Desktop.
Multi-Network / DRA-Driver / Network Status

PoC: Multi-Network / DRA-Driver / Network Status

This PoC is about 3 different things:

  1. ResourceClaim Status for Networking
  2. Container Runtime as DRA Driver
  3. Network DRA Driver as Re-Usable Framework/Pattern

Repositories:

Summary:

Flow

Containerd-PoC

  1. NodePrepareResources is called from Kubelet to the DRA Driver with the list of claim names/UIDs to prepare.
  2. The Claims are retrieved from the Kubernetes API, so the devices are prepared (Stored in order to be used when CNI ADD will be called) and returned from the NodePrepareResources call.
  3. Kubelet calls RunPodSandbox to the Container Runtime, in order to create the Pod.
  4. During the RunPodSandbox process, the claims for the pod currently handled are retrieved from step 2, and the CNIs are called based on the information contained in the claims.
  5. The status is set and updated via the Kubernetes API, then the RunPodSandbox call is finished.

PoC

ResourceClaim Status for Networking

API: https://github.com/LionelJouin/kubernetes/blob/dra-device-status/pkg/apis/resource/types.go#L1102

The ResourceClaimStatus has been extended to contain a new field:

  • AllocatedDeviceStatus: A field containing the status of an allocated device. This contains two ways to report actual data of the device:
    • DeviceInfo: A field accepting any kind of data like the opaque parameters (.spec.devices.config.opaque.parameters).
    • NetworkDeviceInfo: A field only for the network devices.
// ResourceClaimStatus tracks whether the resource has been allocated and what
// the result of that was.
type ResourceClaimStatus struct {
	...

	// DeviceStatuses contains the status of each device allocated for this
	// claim, as reported by the driver. This can include driver-specific
	// information. Entries are owned by their respective drivers.
	//
	// +optional
	// +listType=map
	// +listMapKey=devicePoolName
	// +listMapKey=deviceName
	DeviceStatuses []AllocatedDeviceStatus `json:"deviceStatuses,omitempty" protobuf:"bytes,4,opt,name=deviceStatuses"`
}


// AllocatedDeviceStatus contains the status of an allocated device, if the
// driver chooses to report it. This may include driver-specific information.
type AllocatedDeviceStatus struct {
	// Request is the name of the request in the claim which caused this
	// device to be allocated. Multiple devices may have been allocated
	// per request.
	//
	// +required
	Request string `json:"request" protobuf:"bytes,1,rep,name=request"`

	// Driver specifies the name of the DRA driver whose kubelet
	// plugin should be invoked to process the allocation once the claim is
	// needed on a node.
	//
	// Must be a DNS subdomain and should end with a DNS domain owned by the
	// vendor of the driver.
	//
	// +required
	Driver string `json:"driver" protobuf:"bytes,2,rep,name=driver"`

	// This name together with the driver name and the device name field
	// identify which device was allocated (`<driver name>/<pool name>/<device name>`).
	//
	// Must not be longer than 253 characters and may contain one or more
	// DNS sub-domains separated by slashes.
	//
	// +required
	Pool string `json:"pool" protobuf:"bytes,3,rep,name=pool"`

	// Device references one device instance via its name in the driver's
	// resource pool. It must be a DNS label.
	//
	// +required
	Device string `json:"device" protobuf:"bytes,4,rep,name=device"`

	// Conditions contains the latest observation of the device's state.
	// If the device has been configured according to the class and claim
	// config references, the `Ready` condition should be True.
	//
	// +optional
	// +listType=atomic
	Conditions []metav1.Condition `json:"conditions" protobuf:"bytes,5,rep,name=conditions"`

	// DeviceInfo contains Arbitrary driver-specific data.
	//
	// +optional
	DeviceInfo runtime.RawExtension `json:"deviceInfo,omitempty" protobuf:"bytes,6,rep,name=deviceInfo"`

	// NetworkDeviceInfo contains network-related information specific to the device.
	//
	// +optional
	NetworkDeviceInfo NetworkDeviceInfo `json:"networkDeviceInfo,omitempty" protobuf:"bytes,7,rep,name=networkDeviceInfo"`
}

// NetworkDeviceInfo provides network-related details for the allocated device.
// This information may be filled by drivers or other components to configure
// or identify the device within a network context.
type NetworkDeviceInfo struct {
	// Interface specifies the name of the network interface associated with
	// the allocated device. This might be the name of a physical or virtual
	// network interface.
	//
	// +optional
	Interface string `json:"interface,omitempty" protobuf:"bytes,1,rep,name=interface"`

	// IPs lists the IP addresses assigned to the device's network interface.
	// This can include both IPv4 and IPv6 addresses.
	//
	// +optional
	IPs []string `json:"ips,omitempty" protobuf:"bytes,2,rep,name=ips"`

	// Mac represents the MAC address of the device's network interface.
	//
	// +optional
	Mac string `json:"mac,omitempty" protobuf:"bytes,3,rep,name=mac"`
}

Here is an example of the final ResourceClaim for the demo shown in this PoC:

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
  name: macvlan-eth0-attachment
spec:
  devices:
    config:
    - opaque:
        driver: poc.dra.networking
        parameters:
          config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [ {
            "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type":
            "host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }'
          interface: net1
      requests:
      - macvlan-eth0
    requests:
    - allocationMode: ExactCount
      count: 1
      deviceClassName: cni-v1
      name: macvlan-eth0
status:
  allocation:
    devices:
      config:
      - opaque:
          driver: poc.dra.networking
          parameters:
            config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [
              { "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type":
              "host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }'
            interface: net1
        requests:
        - macvlan-eth0
        source: FromClaim
      results:
      - device: cni
        driver: poc.dra.networking
        pool: kind-worker
        request: macvlan-eth0
    nodeSelector:
      nodeSelectorTerms:
      - matchFields:
        - key: metadata.name
          operator: In
          values:
          - kind-worker
  deviceStatuses:
  - conditions: null
    device: cni
    deviceInfo:
      cniVersion: 1.0.0
      interfaces:
      - mac: 1e:32:6c:b7:c9:66
        name: net1
        sandbox: /var/run/netns/cni-5b7c0846-7995-9450-f441-a177399d08d5
      ips:
      - address: 10.10.1.2/24
        gateway: 10.10.1.1
        interface: 0
    driver: poc.dra.networking
    networkDeviceInfo:
      interface: net1
      ips:
      - 10.10.1.2/24
      mac: 1e:32:6c:b7:c9:66
    pool: kind-worker
    request: macvlan-eth0
  reservedFor:
  - name: demo-a
    resource: pods
    uid: 2bd46adf-b478-4e25-9e37-828539799169

Container Runtime as DRA Driver

The Networking DRA Driver is running in Containerd, so the NRI plugin required in previous PoCs (LionelJouin/network-dra / aojea/kubernetes-network-driver) is no longer required. However, Containerd now requires Kubernetes API access in order to get the ResourceClaims (on NodePrepareResources, step 1 in the flow picture) and to update the ResourceClaims Status (after CNI Add, step 5 in the flow picture).

This PoC uses the kubelet kubeconfig to access the API (Status update should be allowed from kubelet access in that case). In Kind, Containerd starts before kubelet, so this PoC keeps retrying to get the kubeconfig from a goroutine. Once the kubeconfig is retrieved, Containerd will also register itself as DRA plugin (Status could be improved to advertise the availability of the networking DRA Driver?).

When a pod is created, its default primary network will be set up and the other networks will be set up right after.

Network DRA Driver as Re-Usable Framework/Pattern

Highlighted with the aojea/kubernetes-network-driver PoC, a DRA Driver for Networking could be created. NodePrepareResources would retrieve the Resources Claims to be used, store them, so when the function to add the networks is called (on RunPodSandbox), the Resource Claims are already known and can be easily retrieved to add the networks to the pod and update the status.

Build

Clone Kind

git clone [email protected]:kubernetes-sigs/kind.git

Build Kind base image

make -C images/base quick EXTRA_BUILD_OPT="--build-arg CONTAINERD_CLONE_URL=https://github.com/LionelJouin/containerd --build-arg CONTAINERD_VERSION=dra-cni --no-cache" TAG=dra-cni

Clone the Kubernetes fork

git clone [email protected]:kubernetes/kubernetes.git
cd kubernetes
git remote add LionelJouin [email protected]:LionelJouin/kubernetes.git
git fetch LionelJouin
git checkout LionelJouin/dra-device-status

Build Kind image

kind build node-image . --image kindest/node:dra-cni-status --base-image gcr.io/k8s-staging-kind/base:dra-cni

Demo

Kind Cluster config:

---
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  "DynamicResourceAllocation": true
  "DRAControlPlaneController": true
runtimeConfig:
  "resource.k8s.io/v1alpha3": true
kubeadmConfigPatches:
- |
  apiVersion: kubelet.config.k8s.io/v1beta1
  kind: KubeletConfiguration
  logging:
    verbosity: 10
- |
  kind: ClusterConfiguration
  apiServer:
      extraArgs:
        v: "4"
  scheduler:
      extraArgs:
        v: "4"
  controllerManager:
      extraArgs:
        v: "4"
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri"]
    enable_cdi = true
  [plugins.'io.containerd.grpc.v1.cri'.cni]
    cni_dra = true
nodes:
- role: control-plane
  image: kindest/node:dra-cni-status
- role: worker
  image: kindest/node:dra-cni-status

Install CNI Plugins:

kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/e2e/templates/cni-install.yml.j2

Apply ResourceSlice:

cat <<EOF | kubectl apply -f -
---
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceSlice
metadata:
  name: kind-worker-poc-dra-networking
spec:
  devices:
  - name: cni
    basic:
      attributes:
        name:
          string: "eth0"
  driver: poc.dra.networking
  nodeName: kind-worker
  pool:
    name: kind-worker
    resourceSliceCount: 1
EOF

Apply DeviceClass:

cat <<EOF | kubectl apply -f -
---
apiVersion: resource.k8s.io/v1alpha3
kind: DeviceClass
metadata:
  name: cni-v1
EOF

Apply ResourceClaim and Pod:

cat <<EOF | kubectl apply -f -
---
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
  name: macvlan-eth0-attachment
spec:
  devices:
    requests:
    - name: macvlan-eth0
      deviceClassName: cni-v1
    config:
    - requests:
      - macvlan-eth0
      opaque:
        driver: poc.dra.networking
        parameters:
          interface: "net1"
          config: '{
  "cniVersion": "1.0.0",
  "name": "macvlan-eth0",
  "plugins": [
    {
      "type": "macvlan",
      "master": "eth0",
      "mode": "bridge",
      "ipam": {
        "type": "host-local",
        "ranges": [
          [
            {
              "subnet": "10.10.1.0/24"
            }
          ]
        ]
      }
    }
  ]
}'
---
apiVersion: v1
kind: Pod
metadata:
  name: demo-a
spec:
  containers:
  - name: alpine
    image: alpine:latest
    imagePullPolicy: IfNotPresent
    command:
    - sleep
    - infinity
  resourceClaims:
  - name: macvlan-eth0-attachment
    resourceClaimName: macvlan-eth0-attachment
EOF

Verify the resource claim status:

kubectl get resourceclaims macvlan-eth0-attachment -o yaml

Verify the pod interfaces:

kubectl exec -it demo-a -- ip a

Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment