Sandboxed Workloads with the Rebellions NPU Operator¶
Overview¶
The RBLN NPU Operator can expose NPUs to guest VMs through VFIO so that virtualized AI workloads achieve near-native acceleration. When sandbox mode is enabled, the operator performs:
-
Rebinds PCI devices to VFIO driver
Thevfio-managerDaemonSet ships thevfio-manage.shhelper through a ConfigMap. It detaches each RBLN PCI function from its native driver and reattaches it tovfio-pci, making the device safe for passthrough. -
Announces VFIO-backed resources
Thesandbox-device-pluginDaemonSet scans VFIO-managed NPUs and advertises resources such asrebellions.ai/ATOM_CA22_PTandrebellions.ai/ATOM_CA25_PT. Any workload that requests resources through a Kubernetes device plugin—including KubeVirt—can consume them. -
Labels eligible nodes
Node Feature Discovery (NFD) reports the underlying hardware (feature.node.kubernetes.io/pci-1eff.present=true). The operator labels those nodes withrebellions.ai/npu.present=trueand workload-specific keys so that only nodes capable of VFIO passthrough run the sandbox components.
After those controllers reconcile, the NPUs are exposed to KubeVirt VirtualMachine objects through the rebellions.ai/* resource names referenced in each hostDevices stanza.
Prerequisites¶
- Kubernetes 1.19+ cluster
- Worker nodes with RBLN NPUs (RBLN-CA12/CA22/CA25)
- IOMMU enabled in BIOS (
intel_iommu=onoramd_iommu=on) and VFIO kernel modules (vfio,vfio_pci,vfio_iommu_type1) - KubeVirt Operator installed and ready to schedule VMs
- Node Feature Discovery (can be deployed by the Helm chart itself)
Helm Deployment for Sandboxed Workloads¶
-
Install Helm (if necessary)
-
Add the Rebellions chart repository
-
Configure the sandbox workload profile
The chart ships with a ready-made example at sample-values-SandboxWorkload.yaml. It enables the VFIO Manager, Sandbox Device Plugin, and sets suitable resource names:You can also copy the base
values.yamland toggle the relevant keys manually: - setsandboxDevicePlugin.enabled=true- setvfioManager.enabled=true- AdjustsandboxDevicePlugin.resourceList[]to match each card model and VFIO resource name your VMs expect - Ensurenfd.enabled=trueif NFD is not already running -
Install with the sandbox profile
Consuming the VFIO Resources from KubeVirt¶
Note
Enable KubeVirt's HostDevices feature gate and list each Rebellions PCI resource under permittedHostDevices.pciHostDevices before attaching them to VMs:
Create a VirtualMachine manifest where each hostDevices entry references the resource published by the sandbox device plugin:
Tips¶
- Each requested unit corresponds to one VFIO-bound NPU function.
- To request multiple devices, increase both
requestsandlimitsand add multiplehostDevicesentries (rbln1,rbln2, …). - Use distinct resource names in sandboxDevicePlugin.resourceList (for example,
rebellions.ai/ATOM_CA22_PTvsrebellions.ai/ATOM_CA25_PT) when a single Kubernetes cluster includes multiple RBLN device types so workloads can request the exact model they need.