Skip to content

RBLN NPU DRA Driver

Overview

The RBLN NPU DRA Driver enables RBLN NPUs to be used in Kubernetes clusters through the Dynamic Resource Allocation (DRA) framework.

Unlike the Kubernetes Device Plugin, which exposes devices as fixed node resources, DRA allows workloads to request devices dynamically based on specific requirements such as product type, NUMA locality, or PCIe topology.

This is achieved through Kubernetes Resource API (resource.k8s.io) objects, which represent device inventory and selection criteria.

As a result, NPUs can be scheduled more flexibly and precisely in complex environments.

Key Capabilities

  • Advertise node-level NPU device inventory using ResourceSlice
  • Define device types and selection constraints using DeviceClass
  • Request devices using ResourceClaim (Template) specifying quantity and selection criteria
  • Support attribute-based selection using CEL expressions (selectors.cel.expression) for device properties such as product name, NUMA, PCIe topology, and UUID

Recommendation: If you use the DRA Driver, we recommend disabling the Kubernetes Device Plugin to avoid duplicate exposure and debugging confusion.


Deployment

The NPU DRA Driver is deployed through the RBLN NPU Operator.

If the operator is already installed (v0.3.0 or later), DRA mode can be enabled by updating Helm values.

Step 1. Prerequisites

  • Kubernetes v1.34.0 or later
  • RBLN NPU Operator version v0.3.0 or later
  • RBLN Container Toolkit enabled (for CDI-based device injection)

Step 2. Enable DRA mode

Enable DRA mode by setting draKubeletPlugin.enabled=true and disabling the existing Device Plugin as shown below.

The NPU DRA Driver and Kubernetes Device Plugin cannot be used simultaneously, as this may result in duplicate device exposure and unpredictable behavior.

If you are currently using the Device Plugin, disable it before enabling the NPU DRA Driver.

1
2
3
4
5
6
# values-dra.yaml
draKubeletPlugin:
  enabled: true

devicePlugin:
  enabled: false

Step 3. Upgrade the Operator

1
2
3
helm upgrade <release-name> rebellions/rbln-npu-operator \
  -n <namespace> \
  -f values-dra.yaml

Installation Verification

When DRA mode works correctly, the following objects are created/updated in the cluster:

  • DeviceClass (for example, npu.rebellions.ai)
  • ResourceSlice (node-level NPU inventory)
  • ResourceClaim (or template-based claim) when requested by workloads

Check ResourceSlice

kubectl get resourceslices -A

Use the following command to inspect a specific ResourceSlice in detail:

kubectl describe resourceslice <name> -n <namespace>

Tip: When writing selector (CEL) expressions, it is safest to first inspect the key/structure in kubectl describe resourceslice ... output, especially paths like device.attributes["npu.rebellions.ai"].....


Core Resource Model

DRA-based workloads follow this flow:

  1. DeviceClass: Defines the type of device to use (default: npu.rebellions.ai)
  2. ResourceSlice: Exposes available devices and attributes
  3. ResourceClaim: Declares the devices requested by a pod.
  4. Pod: Consumes the ResourceClaim via resources.claims.

NPU Properties (ResourceSlice Attributes)

Each ResourceSlice represents the NPU devices available on a node, including their attributes.

These attributes can be used in selector expressions (selectors.cel.expression) to control how devices are allocated to workloads.

Name Type Example Value Description
driverVersion string 3.0.0 Installed NPU driver version
firmwareVersion string 3.0.0 Device firmware version
pciDeviceID string 0x1250 PCI device ID
pciLinkSpeed string 32.0GT/s PCIe link speed
pciLinkWidth string 16 PCIe link lane width
productName string RBLN-CA25 NPU product name
sid string 0000000022527010 Card/board identifier
type string npu Device type
uuid string 55668c63-d739-4193-8212-ad7ba933520c Unique device identifier
resource.k8s.io/numaNode int 0 NUMA node connected to the device
resource.kubernetes.io/pciBusID string 0000:46:00.0 PCI bus address
resource.kubernetes.io/pcieRoot string 0000:00:00.0 PCIe root address

Quick Start

The example below shows the simplest pattern for allocating NPUs using ResourceClaimTemplate + Pod.

1) Deploy a Pod that allocates 2 RBLN NPUs

Use exactly.count to request multiple resources.

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: double-npu
spec:
  spec:
    devices:
      requests:
      - name: npus
        exactly:
          deviceClassName: npu.rebellions.ai
          count: 2
---
apiVersion: v1
kind: Pod
metadata:
  name: pod0
spec:
  containers:
  - name: ctr0
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: npus
  resourceClaims:
  - name: npus
    resourceClaimTemplateName: double-npu

Apply:

1
2
3
kubectl apply -f double-npu.yaml
kubectl get pod pod0
kubectl describe pod pod0

2) Deploy a Pod that allocates 1 RBLN-CA25 NPU

Use selectors (CEL) to select resources based on ResourceSlice attributes.

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: single-npu
spec:
  spec:
    devices:
      requests:
      - name: npus
        exactly:
          deviceClassName: npu.rebellions.ai
          count: 1
          selectors:
          - cel:
              expression: device.attributes["npu.rebellions.ai"].productName == "RBLN-CA25"
---
apiVersion: v1
kind: Pod
metadata:
  name: pod1
spec:
  containers:
  - name: ctr0
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: npus
  resourceClaims:
  - name: npus
    resourceClaimTemplateName: single-npu

Apply:

1
2
3
kubectl apply -f single-npu.yaml
kubectl get pod pod1
kubectl describe pod pod1

Examples: Common selector/constraints patterns

Allocate devices from the same card using the PCIe Root ID

The RBLN-CA25 integrates four NPU chips on a single card. When allocating two or more NPUs, ensure they are assigned from the same card to prevent performance degradation.

1
2
3
constraints:
- requests: ["npus"]
  matchAttribute: resource.kubernetes.io/pcieRoot

Pin to a single device by UUID

1
2
3
selectors:
- cel:
    expression: device.attributes["npu.rebellions.ai"].uuid == "55668c63-d739-4193-8212-ad7ba933520c"