Skip to content

RBLN NPU Operator Installation Guide for OpenShift

Overview

OpenShift Container Platform is Red Hat's enterprise-grade Kubernetes container platform built on open-source technologies.

This guide explains how to install and configure the RBLN NPU Operator on OpenShift Container Platform. With OpenShift, you can easily provision and manage RBLN NPUs in enterprise environments that require high levels of security, stability, and scalability.

The RNLN NPU Operator uses the Kubernetes operator framework to automate the deployment and lifecycle management of all Rebellions software components required for provisioning RBLN NPUs. These components include the Driver Manager, Kubernetes Device Plugin, Container Toolkit, automatic node labeling via NPU Feature Discovery, Prometheus-based metrics exporter, and others.

The operator is available as a certified operator on the OpenShift ecosystem, and can be installed and managed through the Operator Lifecycle Manager (OLM), enabling seamless integration with existing cluster operations.

For more details, refer to CLOUD-NATIVE SUPPORT > Kubernetes Support > RBLN NPU Operator.

Prerequisites

Ensure the following before proceeding: - OpenShift Container Platform 4.19–4.20 (validated versions) - At least one worker node equipped with an RBLN NPU - cluster-admin privileges - OpenShift CLI (oc) installed and configured


Installing the Node Feature Discovery Operator

Node Feature Discovery (NFD) enables Kubernetes to detect hardware capabilities on each node and expose them as labels.

The RBLN NPU Operator relies on these labels to identify nodes equipped with NPUs and schedule components accordingly. Without NFD, Kubernetes cannot distinguish NPU-enabled nodes from standard nodes.

For installation steps, refer to the official Red Hat documentation: Node Feature Discovery Operator – Red Hat OpenShift Container Platform 4.20

Verify RBLN NPU detection

Rebellions NPUs are identified by PCI vendor ID 1eff.

Verify that NPU-equipped nodes are labeled:

$ oc describe node <worker-node> | grep pci-1eff

Expected output:

feature.node.kubernetes.io/pci-1eff.present=true

You can also check all nodes at once:

$ oc describe nodes | grep pci-1eff

If the label is not present, ensure that: - NFD is installed and running - The NPU is visible on the host system


Configuring Kernel Boot Parameters

Before installing the RBLN NPU Operator, kernel boot parameters must be configured for RBLN NPUs to function correctly. These parameters are required for stable NPU operation under OpenShift.

Parameters are applied using the OpenShift MachineConfig Operator (MCO). For the full procedure, refer to the Kernel Arguments Tuning Guide.


Installing the RBLN NPU Operator

Installing the RBLN NPU Operator deploys the controller responsible for managing all NPU-related components across the cluster, including drivers, device plugins, and monitoring tools. The operator can be installed either via the OpenShift web console (OLM) or the CLI.

Image Pull Secret Required

Before installing, create a docker-registry secret to authenticate against repo.rebellions.ai. See Creating an Image Pull Secret for details.

Installing via the web console

  1. Navigate to Operators > OperatorHub in the OpenShift web console.
  2. Select All Projects.
  3. Search for RBLN Operator > RBLN NPU Operator.
  4. On the Install Operator page, configure the following options:
    • Update channel: stable
    • Version: 0.3.1
    • Installed Namespace: rbln-system (recommended)
  5. Click Install.

RBLN NPU Operator Install Page

Installing via the CLI

  1. Create a namespace for the RBLN NPU Operator:

    $ oc create namespace rbln-system
    
  2. Create an OperatorGroup CR and save it as rbln-npu-operatorgroup.yaml:

    1
    2
    3
    4
    5
    6
    7
    8
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: rbln-npu-operator-group
      namespace: rbln-system
    spec:
      targetNamespaces:
      - rbln-system
    
    $ oc create -f rbln-npu-operatorgroup.yaml
    
  3. Get the default channel and current CSV from the rbln-npu-operator package manifest:

    $ CHANNEL=$(oc get packagemanifest rbln-npu-operator -n openshift-marketplace -o jsonpath='{.status.defaultChannel}')
    $ STARTING_CSV=$(oc get packagemanifests/rbln-npu-operator -n openshift-marketplace -ojson | jq -r '.status.channels[] | select(.name == "'$CHANNEL'") | .currentCSV')
    
  4. Create a Subscription CR using the variables above and save it as rbln-npu-subscription.yaml:

    $ cat <<EOF > rbln-npu-subscription.yaml
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: rbln-npu-operator
      namespace: rbln-system
    spec:
      channel: $CHANNEL
      name: rbln-npu-operator
      installPlanApproval: Automatic
      source: certified-operators
      sourceNamespace: openshift-marketplace
      startingCSV: $STARTING_CSV
    EOF
    $ oc create -f rbln-npu-subscription.yaml
    
  5. Verify the install plan has been created:

    $ oc get installplan -n rbln-system
    

Create the RBLNClusterPolicy Instance

The RBLNClusterPolicy defines how the operator configures NPU-related components across the cluster, including which components to deploy and which container images to use.

The generated clusterpolicy.json contains container image versions for each component (Device Plugin, Driver Manager, Metrics Exporter, etc.).

1
2
3
4
5
$ oc get csv -n rbln-system $STARTING_CSV \
    -o jsonpath="{.metadata.annotations['alm-examples']}" \
    | jq -r 'map(select(.kind == "RBLNClusterPolicy")) | .[0]' \
    > clusterpolicy.json
$ oc apply -f clusterpolicy.json

Component image versions

Verify that container image versions match the latest release. Refer to the Release Notes for the recommended component versions.


Create the RBLNDriver Instance

The RBLNDriver resource deploys the NPU driver on each eligible node.

The driver enables the operating system to communicate with the NPU hardware and is required for workloads to use the device.

The generated driver.json contains the driver container image version.

1
2
3
4
5
$ oc get csv -n rbln-system $STARTING_CSV \
    -o jsonpath="{.metadata.annotations['alm-examples']}" \
    | jq -r 'map(select(.kind == "RBLNDriver")) | .[0]' \
    > driver.json
$ oc apply -f driver.json

Driver image version

Verify that the driver version matches the latest release. Refer to the Release Notes for the recommended driver version.


Verify Installation

After installing the operator, verify that the required resources and components are correctly deployed.

Confirm that the RBLNClusterPolicy and RBLNDriver resources are created:

1
2
3
4
5
6
7
8
$ oc get rblnclusterpolicies.rebellions.ai
NAME                  AGE
rbln-cluster-policy   8m


$ oc get rblndrivers.rebellions.ai
NAME                  AGE
rbln-driver           8m

Verify that all pods in the rbln-system namespace are running:

1
2
3
4
5
6
7
8
9
$ oc get pods -n rbln-system
NAME                                             READY   STATUS    AGE
controller-manager-797798d7b8-rjzht              1/1     Running   8m
rbln-device-plugin-4qgxc                         1/1     Running   8m
rbln-metrics-exporter-jghbg                      1/1     Running   8m
rbln-npu-feature-discovery-zg47r                 1/1     Running   8m
rbln-container-toolkit-ttz2c                     1/1     Running   8m
rbln-driver-ubuntu22.04-6.8.0-90-generic-6gtrc   1/1     Running   8m
rbln-operator-validator-qhf4t                    1/1     Running   8m