RBLN NPU DRA Driver¶
Overview¶
The RBLN NPU DRA Driver enables RBLN NPUs to be used in Kubernetes clusters through the Dynamic Resource Allocation (DRA) framework.
Unlike the Kubernetes Device Plugin, which exposes devices as fixed node resources, DRA allows workloads to request devices dynamically based on specific requirements such as product type, NUMA locality, or PCIe topology.
This is achieved through Kubernetes Resource API (resource.k8s.io) objects, which represent device inventory and selection criteria.
As a result, NPUs can be scheduled more flexibly and precisely in complex environments.
Key Capabilities¶
- Advertise node-level NPU device inventory using
ResourceSlice - Define device types and selection constraints using
DeviceClass - Request devices using
ResourceClaim (Template)specifying quantity and selection criteria - Support attribute-based selection using CEL expressions (
selectors.cel.expression) for device properties such as product name, NUMA, PCIe topology, and UUID
Recommendation: If you use the DRA Driver, we recommend disabling the Kubernetes Device Plugin to avoid duplicate exposure and debugging confusion.
Deployment¶
The NPU DRA Driver is deployed through the RBLN NPU Operator.
If the operator is already installed (v0.3.0 or later), DRA mode can be enabled by updating Helm values.
Step 1. Prerequisites¶
- Kubernetes v1.34.0 or later
- RBLN NPU Operator version
v0.3.0or later - RBLN Container Toolkit enabled (for CDI-based device injection)
Step 2. Enable DRA mode¶
Enable DRA mode by setting draKubeletPlugin.enabled=true and disabling the existing Device Plugin as shown below.
The NPU DRA Driver and Kubernetes Device Plugin cannot be used simultaneously, as this may result in duplicate device exposure and unpredictable behavior.
If you are currently using the Device Plugin, disable it before enabling the NPU DRA Driver.
Step 3. Upgrade the Operator¶
Installation Verification¶
When DRA mode works correctly, the following objects are created/updated in the cluster:
DeviceClass(for example,npu.rebellions.ai)ResourceSlice(node-level NPU inventory)ResourceClaim(or template-based claim) when requested by workloads
Check ResourceSlice¶
Use the following command to inspect a specific ResourceSlice in detail:
Tip: When writing selector (CEL) expressions, it is safest to first inspect the key/structure in
kubectl describe resourceslice ...output, especially paths likedevice.attributes["npu.rebellions.ai"].....
Core Resource Model¶
DRA-based workloads follow this flow:
- DeviceClass: Defines the type of device to use (default:
npu.rebellions.ai) - ResourceSlice: Exposes available devices and attributes
- ResourceClaim: Declares the devices requested by a pod.
- Pod: Consumes the
ResourceClaimviaresources.claims.
NPU Properties (ResourceSlice Attributes)¶
Each ResourceSlice represents the NPU devices available on a node, including their attributes.
These attributes can be used in selector expressions (selectors.cel.expression) to control how devices are allocated to workloads.
| Name | Type | Example Value | Description |
|---|---|---|---|
driverVersion |
string |
3.0.0 |
Installed NPU driver version |
firmwareVersion |
string |
3.0.0 |
Device firmware version |
pciDeviceID |
string |
0x1250 |
PCI device ID |
pciLinkSpeed |
string |
32.0GT/s |
PCIe link speed |
pciLinkWidth |
string |
16 |
PCIe link lane width |
productName |
string |
RBLN-CA25 |
NPU product name |
sid |
string |
0000000022527010 |
Card/board identifier |
type |
string |
npu |
Device type |
uuid |
string |
55668c63-d739-4193-8212-ad7ba933520c |
Unique device identifier |
resource.k8s.io/numaNode |
int |
0 |
NUMA node connected to the device |
resource.kubernetes.io/pciBusID |
string |
0000:46:00.0 |
PCI bus address |
resource.kubernetes.io/pcieRoot |
string |
0000:00:00.0 |
PCIe root address |
Quick Start¶
The example below shows the simplest pattern for allocating NPUs using ResourceClaimTemplate + Pod.
1) Deploy a Pod that allocates 2 RBLN NPUs¶
Use exactly.count to request multiple resources.
Apply:
2) Deploy a Pod that allocates 1 RBLN-CA25 NPU¶
Use selectors (CEL) to select resources based on ResourceSlice attributes.
Apply:
Examples: Common selector/constraints patterns¶
Allocate devices from the same card using the PCIe Root ID¶
The RBLN-CA25 integrates four NPU chips on a single card. When allocating two or more NPUs, ensure they are assigned from the same card to prevent performance degradation.