Skip to content

NPU Driver Installation

The RBLN NPU Operator supports two ways to install the NPU kernel driver on each node. Pick one before deploying the operator chart:

By default, the chart sets driver.enabled=false, so host driver mode is used unless you explicitly enable container driver mode.

  • Container driver mode (driver.enabled=true): The operator deploys a DaemonSet based on the RBLNDriver CR spec, and installs and manages the kernel driver. Use this when the operator must consistently control the driver lifecycle, including installation, upgrades, and rollbacks, through Kubernetes native controls.
  • Host driver mode (driver.enabled=false): You install the kernel driver directly on each node's host OS (runfile installer, distro package, dkms, etc.) before deploying the operator. The operator then automatically detects the driver installed on the host and does not perform installation, upgrades, or rollbacks through the container driver.

This page covers both installation methods and how to verify the active mode for each node. For the operator chart install itself, see Installing the NPU Operator.

Container driver mode

Set driver.enabled=true when running helm install or helm upgrade. The operator creates the RBLNDriver CR, deploys the driver DaemonSet, and keeps the kernel driver in a consistent state on each node.

driver:
  enabled: true

For the full lifecycle, including driver image selection, upgrade rollout policy, manual upgrades, and upgrade exclusions for each node, see NPU Driver Upgrade Workflow.

Host driver mode

Install the kernel driver on every NPU node first (see the RBLN Driver Installation Guide) and make sure rbln-smi is in the host's PATH. Then install the operator with driver.enabled=false:

driver:
  enabled: false

Or set it directly on the command line:

1
2
3
4
5
6
7
$ export CHART_VERSION=0.4.0

$ helm install rbln-npu-operator \
     oci://docker.io/rebellions/rbln-npu-operator-chart \
     --version ${CHART_VERSION} \
     --namespace rbln-system --create-namespace \
     --set driver.enabled=false

With driver.enabled=false, the chart does not create the RBLNDriver CR or deploy the driver DaemonSet. All other components deploy normally because they only require the kernel module to be loaded and operate independently of how the kernel module was installed.

Operational responsibilities

  • Kernel updates: if the host kernel is upgraded, you must reinstall the driver.
  • Driver upgrades: use existing host management tools such as apt, yum, or DKMS. After the upgrade, reboot the node so the new kernel module is applied.

Mixed Clusters

Container driver mode nodes and host driver mode nodes can run together under a single RBLNClusterPolicy. Even when driver.enabled=true is set, the operator detects each node's state individually and works in the appropriate mode for that node.

  1. The driver DaemonSet starts on the node.
  2. The Driver Manager initContainer runs chroot /host rbln-smi --version to check the driver installed on the host.
  3. If the check succeeds, Driver Manager adds the rebellions.ai/npu.deploy.driver=pre-installed label to the node and exits without performing container-based driver installation. The host driver is left in place.
  4. The same detection logic continues to apply on subsequent reconciles, and the overall behavior remains idempotent.
  5. All other components managed by the operator deploy and run normally using the kernel module installed on the host.

This automatic detection feature supports the following two common scenarios:

  • Driver installation methods can be mixed across nodes and still operate normally. Some nodes can have drivers installed directly on the host, while other nodes can use the container driver. The operator automatically selects the appropriate installation method for each node, so no separate node configuration is required.
  • Switching a node to host driver mode after installation is straightforward. Install the kernel driver on the host and restart the driver pod. The driver pod detects the host driver and automatically stops the container installation.

For clusters where every node uses a driver installed directly on the host, prefer driver.enabled=false.

Verifying which mode is active per node

After install, check which mode each node is using:

$ kubectl get nodes -L rebellions.ai/npu.deploy.driver
  • If the value is true, the container driver managed by the operator is active on the node.
  • If the value is pre-installed, the operator detected that a driver already exists on the host, so it skips installation through the container driver.