Sandboxed Workloads with the Rebellions NPU Operator¶

Overview¶

The RBLN NPU Operator can expose NPUs to guest VMs through VFIO so that virtualized AI workloads achieve near-native acceleration. When sandbox mode is enabled, the operator performs:

Rebinds PCI devices to VFIO driver
The vfio-manager DaemonSet ships the vfio-manage.sh helper through a ConfigMap. It detaches each RBLN PCI function from its native driver and reattaches it to vfio-pci, making the device safe for passthrough.
Announces VFIO-backed resources
The sandbox-device-plugin DaemonSet scans VFIO-managed NPUs and advertises resources such as rebellions.ai/ATOM_CA22_PT and rebellions.ai/ATOM_CA25_PT. Any workload that requests resources through a Kubernetes device plugin—including KubeVirt—can consume them.
Labels eligible nodes
Node Feature Discovery (NFD) reports the underlying hardware (feature.node.kubernetes.io/pci-1eff.present=true). The operator labels those nodes with rebellions.ai/npu.present=true and workload-specific keys so that only nodes capable of VFIO passthrough run the sandbox components.

After those controllers reconcile, the NPUs are exposed to KubeVirt VirtualMachine objects through the rebellions.ai/* resource names referenced in each hostDevices stanza.

Prerequisites¶

Kubernetes 1.19+ cluster
Worker nodes with RBLN NPUs (RBLN-CA12/CA22/CA25)
IOMMU enabled in BIOS (intel_iommu=on or amd_iommu=on) and VFIO kernel modules (vfio, vfio_pci, vfio_iommu_type1)
KubeVirt Operator installed and ready to schedule VMs
Node Feature Discovery (can be deployed by the Helm chart itself)

Helm Deployment for Sandboxed Workloads¶

Install Helm (if necessary)

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
     && chmod 700 get_helm.sh \
     && ./get_helm.sh

Add the Rebellions chart repository

helm repo add rebellions https://rbln-sw.github.io/rbln-npu-operator
helm repo update

Configure the sandbox workload profile
The chart ships with a ready-made example at sample-values-SandboxWorkload.yaml. It enables the VFIO Manager, Sandbox Device Plugin, and sets suitable resource names:

name: rbln
nfd:
  enabled: true

operator:
  replicas: 1

sandboxDevicePlugin:
  enabled: true
  resourceList:
  - resourceName: ATOM_CA22_PT
    resourcePrefix: rebellions.ai
    productCardNames:
    - RBLN-CA22
  - resourceName: ATOM_CA25_PT
    resourcePrefix: rebellions.ai
    productCardNames:
    - RBLN-CA25

vfioManager:
  enabled: true

You can also copy the base values.yaml and toggle the relevant keys manually: - set sandboxDevicePlugin.enabled=true - set vfioManager.enabled=true - Adjust sandboxDevicePlugin.resourceList[] to match each card model and VFIO resource name your VMs expect - Ensure nfd.enabled=true if NFD is not already running

Install with the sandbox profile

helm install rbln-npu-operator \
     rebellions/rbln-npu-operator \
     -n rbln-system --create-namespace \
     -f sample-values-SandboxWorkload.yaml

Consuming the VFIO Resources from KubeVirt¶

Note

Enable KubeVirt's HostDevices feature gate and list each Rebellions PCI resource under permittedHostDevices.pciHostDevices before attaching them to VMs:

apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
  name: kubevirt
  namespace: kubevirt
spec:
  configuration:
    developerConfiguration:
      featureGates:
      - HostDevices
    permittedHostDevices:
      pciHostDevices:
      - pciVendorSelector: 1eff:1220
        resourceName: rebellions.ai/ATOM_CA22_PT
      - pciVendorSelector: 1eff:1250
        resourceName: rebellions.ai/ATOM_CA25_PT

Create a VirtualMachine manifest where each hostDevices entry references the resource published by the sandbox device plugin:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm-npu-workload
spec:
  runStrategy: Always
  template:
    metadata:
      labels:
        app: vm-npu
    spec:
      domain:
        devices:
          hostDevices:
          - name: rbln0
            deviceName: rebellions.ai/ATOM_CA25_PT
            tag: "pci"
        resources:
          requests:
            rebellions.ai/ATOM_PT: 1
          limits:
            rebellions.ai/ATOM_PT: 1

Tips¶

Each requested unit corresponds to one VFIO-bound NPU function.
To request multiple devices, increase both requests and limits and add multiple hostDevices entries (rbln1, rbln2, …).
Use distinct resource names in sandboxDevicePlugin.resourceList (for example, rebellions.ai/ATOM_CA22_PT vs rebellions.ai/ATOM_CA25_PT) when a single Kubernetes cluster includes multiple RBLN device types so workloads can request the exact model they need.