Skip to content

Kubernetes Support

RBLN SDK provides Kubernetes Device Plugin to support RBLN NPUs on Kubernetes cluster environment.

Step 1. Prepare NPU nodes

First, you need to prepare Kubernetes nodes equipped with RBLN NPUs and install the RBLN Driver. Typically, the stable version of the RBLN Driver is already installed on the cloud server you are currently using. If you can see the RBLN NPUs by executing the command rbln-stat, a command-line interface (CLI) utility which is already included in the RBLN Driver package, then you can skip the installation of the RBLN Driver. For more information about the installation, please refer to the Installation Guide.

$ rbln-stat
Mon May 27 03:28:44 2024
+-------------------------------------------------------------------------------------------------+
|                        Device Infomation KMD ver: 0.12.37-6eed3e9-release                       |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| NPU |    Name   | Device    |   PCI BUS ID  | Temp |  Power  |    Memory(used/total)    |  Util |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| 0   | RBLN-CA02 | rbln0     |  0000:b6:00.0 |  25C |   6.2W  |      0.0B / 15.7GiB      |   0.0 |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
+-------------------------------------------------------------------------------------------------+
|                                        Context Infomation                                       |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process             |     PID      | CTX | Priority | PTID |            Memalloc | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| N/A | N/A                 |     N/A      | N/A |   N/A    | N/A  |                 N/A |  N/A   |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+

Step 2. Install Device Plugin

Next step is to install the device plugin. Here is the kubectl apply commands to install the device plugin:

$ kubectl apply -f https://raw.githubusercontent.com/rebellions-sw/rebel-k8s-device-plugin/master/deployments/rebel/configmap.yaml
$ kubectl apply -f https://raw.githubusercontent.com/rebellions-sw/rebel-k8s-device-plugin/master/deployments/rebel/daemonset.yaml

You can see the DaemonSet rebel-device-plugin under kube-system namespace, and Pods created from the Daemonset as below:

$ kubectl get ds -n kube-system rebel-device-plugin
NAME                  DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR              AGE
rebel-device-plugin   4         4         4       4            4           kubernetes.io/arch=amd64   26s

$ kubectl get pod -n kube-system -l name=rebel-device-plugin -o wide
NAME                        READY   STATUS    RESTARTS   AGE
rebel-device-plugin-5tj4z   1/1     Running   0          41s
rebel-device-plugin-7zqgh   1/1     Running   0          41s
rebel-device-plugin-ddtts   1/1     Running   0          41s
rebel-device-plugin-zrw8s   1/1     Running   0          41s

You can also check the RBLN NPU resources (rebellions.ai/ATOM) with the kubectl describe command as below:

1
2
3
4
5
6
7
8
9
$ kubectl describe node <your-node-name>
...
Capacity:
  ...
  rebellions.ai/ATOM:  1
Allocatable:
  ...
  rebellions.ai/ATOM:  1
...

Device plugin configuration

The device plugin requires ConfigMap to configure the resources (e.g., rebellions.ai/ATOM for the resource name). Here is the default ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: rebel-device-plugin-config
  namespace: kube-system
data:
  config.json: |
    {
        "resourceList": [
            {
                "resourceName": "ATOM",
                "resourcePrefix": "rebellions.ai",
                "deviceType": "accelerator",
                "selectors": {
                    "vendors": ["1eff"],
                    "devices": [
                        "0010",
                        "0011",
                        "1020",
                        "1021",
                        "1120",
                        "1121",
                        "1220",
                        "1221"
                    ],
                    "drivers": ["rebellions"]
                }
            }
        ]
    }

The device plugin tries to search for the configured resources on every node based on the configuration set by the ConfigMap. If you want to provide a node-specific configuration in a heterogeneous cluster, you will need to modify your ConfigMap and DaemonSet. For more details, please refer to the link.

Step 3. Create a Pod with NPUs

To create a Pod with NPU resources, you should add spec.containers[].resources.limits in your Pod spec as below:

apiVersion: v1
kind: Pod
metadata:
  name: rebel-device-plugin-testpod
spec:
  containers:
  - name: ubuntu
    image: ubuntu:latest
    imagePullPolicy: IfNotPresent
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 300000; done;"]
    resources:
      requests:
        rebellions.ai/ATOM: 1
      limits:
        rebellions.ai/ATOM: 1

You can create a Pod with the Pod spec using kubectl create command:

$ kubectl create -f https://raw.githubusercontent.com/rebellions-sw/rebel-k8s-device-plugin/master/deployments/rebel/pod-tc.yaml

A single rebellions.ai/ATOM resource has been assigned to the Pod:

$ kubectl describe pod rebel-device-plugin-testpod
Name:             rebel-device-plugin-testpod
Namespace:        default
...
Containers:
  ubuntu:
    ...
    Limits:
      rebellions.ai/ATOM:  1
    Requests:
      rebellions.ai/ATOM:  1
    ...

The device plugin automatically mounts the rbln-stat from the host machine to the Pod container. You can check it on the container as below:

$ kubectl exec -it rebel-device-plugin-testpod -- bash
root@rebel-device-plugin-testpod:/# rbln-stat
Mon May 27 08:44:05 2024
+-------------------------------------------------------------------------------------------------+
|                        Device Infomation KMD ver: 0.12.37-6eed3e9-release                       |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| NPU |    Name   | Device    |   PCI BUS ID  | Temp |  Power  |    Memory(used/total)    |  Util |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| 0   | RBLN-CA02 | rbln0     |  0000:b6:00.0 |  25C |   6.2W  |      0.0B / 15.7GiB      |   0.0 |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
+-------------------------------------------------------------------------------------------------+
|                                        Context Infomation                                       |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process             |     PID      | CTX | Priority | PTID |            Memalloc | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| N/A | N/A                 |     N/A      | N/A |   N/A    | N/A  |                 N/A |  N/A   |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+