RBLN Kubernetes Device Plugin
BREAKING CHANGE
The v0.4.0 version of the device plugin sets the useGenericResourceName option to true by default. This means that the old resource name rebellions.ai/ATOM is changed to rebellions.ai/npu. If you want to use the old resource name, set the useGenericResourceName option to false.
RBLN SDK provides Kubernetes Device Plugin to support RBLN NPUs on Kubernetes cluster environment.
Step 1. Prepare NPU nodes
First, you need to prepare Kubernetes nodes equipped with RBLN NPUs and install the RBLN Driver. Typically, the stable version of the RBLN Driver is already installed on the cloud server you are currently using. If you can see the RBLN NPUs by executing the command rbln-smi, a command-line interface (CLI) utility which is already included in the RBLN Driver package, then you can skip the installation of the RBLN Driver. For more information about the installation, please refer to the Installation Guide.
| $ rbln-smi
Mon May 27 03:28:44 2024
+-------------------------------------------------------------------------------------------------+
| Device Infomation KMD ver: 0.12.37-6eed3e9-release |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| NPU | Name | Device | PCI BUS ID | Temp | Power | Memory(used/total) | Util |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| 0 | RBLN-CA02 | rbln0 | 0000:b6:00.0 | 25C | 6.2W | 0.0B / 15.7GiB | 0.0 |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
+-------------------------------------------------------------------------------------------------+
| Context Infomation |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process | PID | CTX | Priority | PTID | Memalloc | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
|
CDI Configuration (Required)
The device plugin requires CDI to be enabled in the container runtime. You can configure containerd manually (and restart it), or let the Container Toolkit handle CDI setup.
containerd v1.7.x (/etc/containerd/config.toml)
| [plugins]
[plugins."io.containerd.grpc.v1.cri"]
enable_cdi = true
cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
|
containerd v2.x.x (/etc/containerd/config.toml)
| [plugins]
[plugins."io.containerd.cri.v1.runtime"]
enable_cdi = true
cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
|
When CDI is enabled, the device plugin adds a CDI annotation that references rebellions.ai/npu=runtime. This allows required libraries and tools (such as rbln-smi) to be automatically mounted when a workload requests the NPU resource (for example, rebellions.ai/npu).
Step 2. Install Device Plugin
The device plugin is recommended to be installed using the NPU Operator. However, if a standalone installation is required, it can be installed using the following command.
| $ git clone https://github.com/RBLN-SW/k8s-device-plugin
$ cd k8s-device-plugin/deployments/helm/rbln-device-plugin
$ helm install rbln-device-plugin . -n kube-system
|
You can see the DaemonSet rbln-device-plugin under kube-system namespace, and Pods created from the Daemonset as below:
| $ kubectl get ds -n kube-system rbln-device-plugin
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
rbln-device-plugin 4 4 4 4 4 kubernetes.io/arch=amd64 26s
$ kubectl get pod -n kube-system -l name=rbln-device-plugin -o wide
NAME READY STATUS RESTARTS AGE
rbln-device-plugin-5tj4z 1/1 Running 0 41s
rbln-device-plugin-7zqgh 1/1 Running 0 41s
rbln-device-plugin-ddtts 1/1 Running 0 41s
rbln-device-plugin-zrw8s 1/1 Running 0 41s
|
You can also check the RBLN NPU resources (rebellions.ai/npu) with the kubectl describe command as below:
| $ kubectl describe node <your-node-name>
...
Capacity:
...
rebellions.ai/npu: 1
Allocatable:
...
rebellions.ai/npu: 1
...
|
Step 3. Create a Pod with NPUs
To create a Pod with NPU resources, you should add spec.containers[].resources.limits in your Pod spec as below:
| apiVersion: v1
kind: Pod
metadata:
name: rbln-device-plugin-testpod
spec:
containers:
- name: ubuntu
image: ubuntu:latest
imagePullPolicy: IfNotPresent
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 300000; done;"]
resources:
requests:
rebellions.ai/npu: 1
limits:
rebellions.ai/npu: 1
|
You can create a Pod with the Pod spec using kubectl create command:
| $ kubectl create -f https://raw.githubusercontent.com/rebellions-sw/rbln-k8s-device-plugin/master/deployments/rbln/pod-tc.yaml
|
A single rebellions.ai/npu resource has been assigned to the Pod:
| $ kubectl describe pod rbln-device-plugin-testpod
Name: rbln-device-plugin-testpod
Namespace: default
...
Containers:
ubuntu:
...
Limits:
rebellions.ai/npu: 1
Requests:
rebellions.ai/npu: 1
...
|
The device plugin automatically mounts the rbln-smi from the host machine to the Pod container. You can check it on the container as below:
| $ kubectl exec -it rbln-device-plugin-testpod -- bash
root@rbln-device-plugin-testpod:/# rbln-smi
Mon May 27 08:44:05 2024
+-------------------------------------------------------------------------------------------------+
| Device Infomation KMD ver: 0.12.37-6eed3e9-release |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| NPU | Name | Device | PCI BUS ID | Temp | Power | Memory(used/total) | Util |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| 0 | RBLN-CA02 | rbln0 | 0000:b6:00.0 | 25C | 6.2W | 0.0B / 15.7GiB | 0.0 |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
+-------------------------------------------------------------------------------------------------+
| Context Infomation |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process | PID | CTX | Priority | PTID | Memalloc | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
|