RBLN NPU Operator¶
The rbln-npu-operator deploys and manages the Rebellions software components needed to use the RBLN NPU family across Kubernetes and OpenShift clusters. A single RBLNClusterPolicy custom resource controls the workflow for the whole cluster. When you create a RBLNClusterPolicy custom resource, the operator automatically:
- Detect hardware and labels nodes using Node Feature Discovery (NFD)
- Expose devices for two workload modes: container and VM passthrough
- Install and maintains the NPU driver via the Driver Manager (driven by the
RBLNDrivercustom resource) - Deploy and manages the components required to use NPUs in Kubernetes environments
- Configure RBAC/SCC automatically so it works on both OpenShift and standard Kubernetes
For installation, upgrade, removal, and related operator guides, see:
- Installing the NPU Operator
- Upgrading the NPU Operator
- Uninstalling the NPU Operator
- NPU Driver Installation
- NPU Driver Upgrade Workflow
- Per-node Workload Labeling
- Sandboxed Workloads
Core Components¶
The table below summarizes the key components that the operator deploys and manages.
| Component | Description |
|---|---|
| Device Plugin | Exposes NPU resources such as rebellions.ai/npu so Pods can request the device through resources.limits. |
| NPU DRA Driver | Allocates NPUs through Kubernetes Dynamic Resource Allocation (DeviceClass / ResourceSlice / ResourceClaim). Cannot be enabled together with the Device Plugin. |
| Sandbox Device Plugin | Exposes VFIO resources for each card, such as rebellions.ai/ATOM_PT, for VM passthrough; designed for KubeVirt environments. |
| VFIO Manager | Binds and unbinds RBLN PCI functions to the host's vfio-pci driver for the Sandbox Device Plugin. |
| Driver Manager | Reconciles the kernel driver, UMD libraries, and rbln-smi on each node according to the RBLNDriver custom resource. |
| RBLN Daemon | Runs on each node as a host service used by NPU workloads (default hostPort 50051). Disabled by default; enable with rblnDaemon.enabled=true. |
| NPU Feature Discovery | Detects RBLN PCI devices (vendor 1eff) and applies attributes such as rebellions.ai/npu.present, rebellions.ai/npu.count, and rebellions.ai/npu.product as node labels. |
| Container Toolkit | Generates CDI specifications and configures the container runtime so the Device Plugin and workloads can use CDI-based device injection. |
| Metrics Exporter | Exports NPU telemetry (utilization, temperature, DRAM, power) in Prometheus format. |
| Operator Validator | Verifies driver and Container Toolkit readiness on each node and writes a readiness state file via hostPath that other components depend on. |
| Operator Manager | Reconciles the RBLNClusterPolicy and RBLNDriver CRDs and orchestrates the components above. |