Kubernetes Support
RBLN SDK provides Kubernetes Device Plugin to support RBLN NPUs on Kubernetes cluster environment.
Step 1. Prepare NPU nodes
First, you need to prepare Kubernetes nodes equipped with RBLN NPUs and install the RBLN Driver. Typically, the stable version of the RBLN Driver is already installed on the cloud server you are currently using. If you can see the RBLN NPUs by executing the command rbln-stat, a command-line interface (CLI) utility which is already included in the RBLN Driver package, then you can skip the installation of the RBLN Driver. For more information about the installation, please refer to the Installation Guide.
| $ rbln-stat
Mon May 27 03:28:44 2024
+-------------------------------------------------------------------------------------------------+
| Device Infomation KMD ver: 0.12.37-6eed3e9-release |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| NPU | Name | Device | PCI BUS ID | Temp | Power | Memory(used/total) | Util |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| 0 | RBLN-CA02 | rbln0 | 0000:b6:00.0 | 25C | 6.2W | 0.0B / 15.7GiB | 0.0 |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
+-------------------------------------------------------------------------------------------------+
| Context Infomation |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process | PID | CTX | Priority | PTID | Memalloc | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
|
Step 2. Install Device Plugin
Next step is to install the device plugin. Here is the kubectl apply
commands to install the device plugin:
| $ kubectl apply -f https://raw.githubusercontent.com/rebellions-sw/rebel-k8s-device-plugin/master/deployments/rebel/configmap.yaml
$ kubectl apply -f https://raw.githubusercontent.com/rebellions-sw/rebel-k8s-device-plugin/master/deployments/rebel/daemonset.yaml
|
You can see the DaemonSet rebel-device-plugin
under kube-system
namespace, and Pods created from the Daemonset as below:
| $ kubectl get ds -n kube-system rebel-device-plugin
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
rebel-device-plugin 4 4 4 4 4 kubernetes.io/arch=amd64 26s
$ kubectl get pod -n kube-system -l name=rebel-device-plugin -o wide
NAME READY STATUS RESTARTS AGE
rebel-device-plugin-5tj4z 1/1 Running 0 41s
rebel-device-plugin-7zqgh 1/1 Running 0 41s
rebel-device-plugin-ddtts 1/1 Running 0 41s
rebel-device-plugin-zrw8s 1/1 Running 0 41s
|
You can also check the RBLN NPU resources (rebellions.ai/ATOM
) with the kubectl describe
command as below:
| $ kubectl describe node <your-node-name>
...
Capacity:
...
rebellions.ai/ATOM: 1
Allocatable:
...
rebellions.ai/ATOM: 1
...
|
Device plugin configuration
The device plugin requires ConfigMap
to configure the resources (e.g., rebellions.ai/ATOM
for the resource name). Here is the default ConfigMap
:
| apiVersion: v1
kind: ConfigMap
metadata:
name: rebel-device-plugin-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [
{
"resourceName": "ATOM",
"resourcePrefix": "rebellions.ai",
"deviceType": "accelerator",
"selectors": {
"vendors": ["1eff"],
"devices": [
"0010",
"0011",
"1020",
"1021",
"1120",
"1121",
"1220",
"1221"
],
"drivers": ["rebellions"]
}
}
]
}
|
The device plugin tries to search for the configured resources on every node based on the configuration set by the ConfigMap
. If you want to provide a node-specific configuration in a heterogeneous cluster, you will need to modify your ConfigMap
and DaemonSet
. For more details, please refer to the link.
Step 3. Create a Pod with NPUs
To create a Pod with NPU resources, you should add spec.containers[].resources.limits
in your Pod spec as below:
| apiVersion: v1
kind: Pod
metadata:
name: rebel-device-plugin-testpod
spec:
containers:
- name: ubuntu
image: ubuntu:latest
imagePullPolicy: IfNotPresent
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 300000; done;"]
resources:
requests:
rebellions.ai/ATOM: 1
limits:
rebellions.ai/ATOM: 1
|
You can create a Pod with the Pod spec using kubectl create
command:
| $ kubectl create -f https://raw.githubusercontent.com/rebellions-sw/rebel-k8s-device-plugin/master/deployments/rebel/pod-tc.yaml
|
A single rebellions.ai/ATOM
resource has been assigned to the Pod:
| $ kubectl describe pod rebel-device-plugin-testpod
Name: rebel-device-plugin-testpod
Namespace: default
...
Containers:
ubuntu:
...
Limits:
rebellions.ai/ATOM: 1
Requests:
rebellions.ai/ATOM: 1
...
|
The device plugin automatically mounts the rbln-stat
from the host machine to the Pod container. You can check it on the container as below:
| $ kubectl exec -it rebel-device-plugin-testpod -- bash
root@rebel-device-plugin-testpod:/# rbln-stat
Mon May 27 08:44:05 2024
+-------------------------------------------------------------------------------------------------+
| Device Infomation KMD ver: 0.12.37-6eed3e9-release |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| NPU | Name | Device | PCI BUS ID | Temp | Power | Memory(used/total) | Util |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| 0 | RBLN-CA02 | rbln0 | 0000:b6:00.0 | 25C | 6.2W | 0.0B / 15.7GiB | 0.0 |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
+-------------------------------------------------------------------------------------------------+
| Context Infomation |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process | PID | CTX | Priority | PTID | Memalloc | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
|