Skip to content

RBLN Container Toolkit

The RBLN Container Toolkit enables container runtimes to access Rebellions NPU devices using the Container Device Interface (CDI) specification. It automatically discovers host RBLN libraries and tools, generates CDI specs, and configures your container runtime, enabling containers to use NPU hardware with zero application changes.

Scope

The Container Toolkit is currently responsible only for CDI spec generation and runtime configuration. It exposes RBLN libraries and tools (such as rbln-smi) to containers via CDI. For NPU device allocation using RSD groups, see the NPU Allocation guide. These features will be integrated into a unified toolkit in a future release.

How It Works

                          ┌──────────────────────────────────────────┐
  Host System             │           RBLN Container Toolkit         │
 ─────────────            │                                          │
                          │  1. Discover    RBLN libs & tools        │
  /usr/lib64/             │       ↓         on the host              │
    librbln-*.so ────────►│  2. Generate    CDI spec (rbln.yaml)     │
  /usr/bin/               │       ↓                                  │
    rbln-smi ────────────►│  3. Configure   container runtime        │
                          │       ↓         (containerd/crio/docker) │
                          │  4. Run Hook    to update ldcache        │
                          │                 inside containers        │
                          └─────────────────────┬────────────────────┘
                          ┌──────────────────────────────────────────┐
  Container               │  $ docker run --device rebellions.ai/    │
                          │      npu=all my-app                      │
                          │                                          │
                          │  ✓ RBLN libraries mounted                │
                          │  ✓ Tools available (rbln-smi)            │
                          │  ✓ ldcache updated automatically         │
                          └──────────────────────────────────────────┘

The toolkit provides three binaries:

Binary Role
rbln-ctk Main CLI — generates CDI specs, configures runtimes, inspects the system
rbln-ctk-daemon Kubernetes daemon — automated setup with health endpoints and graceful shutdown
rbln-cdi-hook OCI hook — runs inside containers to update ldcache and create symlinks

The toolkit ships in two distribution flavors with intentionally different runtime profiles:

Flavor Target Device injection Refresh path
DEB / RPM package Standalone Docker hosts rbln-ctk injects /dev/rbln* and the matching /dev/rsd* group node via CDI. NPU↔RSD mapping is resolved through librbln-ml, which the package pulls in as a runtime dependency. Host-side rbln-cdi-refresh.path systemd unit, auto-enabled by postinstall. See Systemd Integration.
Container image Kubernetes DaemonSet rbln-ctk-daemon skips device-node emission — device-plugin (or DRA) owns per-Pod /dev/rbln* allocation. Because the NPU↔RSD resolver is never invoked in this mode, the image is built without a librbln-ml dependency. In-process loop inside rbln-ctk-daemon. See Automatic CDI Refresh.

Prerequisites

OS Architecture Container Runtime
Ubuntu 22.04/24.04 x86_64 containerd, CRI-O, Docker
RHEL 9+ x86_64 containerd, CRI-O, Docker
  • RBLN driver — install it on the host before the container toolkit. The toolkit's DEB / RPM package declares the driver's UMD library as a hard runtime dependency (librbln-ml3 on Debian/Ubuntu, librbln-ml on RHEL-family hosts), so installing the toolkit on a host without the driver fails with an unmet-dependency error from apt-get or dnf.

Installation

Ubuntu / Debian

  1. Add the Rebellions official GPG key (skip if already configured):

    1
    2
    3
    4
    5
    $ sudo apt-get update
    $ sudo apt-get install ca-certificates curl
    $ sudo install -m 0755 -d /etc/apt/keyrings
    $ sudo curl -fsSL https://nexus.rebellions.ai/repository/raw-public/rebellions.asc -o /etc/apt/keyrings/rebellions.asc
    $ sudo chmod a+r /etc/apt/keyrings/rebellions.asc
    
  2. Add the repository to APT sources (skip if already configured):

    1
    2
    3
    $ echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/rebellions.asc] https://nexus.rebellions.ai/repository/apt-public/ stable main" | \
      sudo tee /etc/apt/sources.list.d/rebellions-apt-public.list > /dev/null
    
  3. Update APT and install:

    $ sudo apt-get update
    $ sudo apt-get install rbln-container-toolkit
    

RHEL

  1. Register the Rebellions yum repository (skip if already configured):

    1
    2
    3
    4
    5
    6
    7
    $ sudo tee /etc/yum.repos.d/rebellions.repo > /dev/null <<EOF
    [rebellions-stable]
    name=Rebellions Stable
    baseurl=https://nexus.rebellions.ai/repository/yum-public/stable/x86_64/Packages/
    enabled=1
    gpgcheck=0
    EOF
    
  2. Refresh the cache and install:

    $ sudo dnf clean all
    $ sudo dnf install -y rbln-container-toolkit
    

Unsigned RPM packages

The current RPM packages are not yet signed, so gpgcheck=0 is required to install them. Switch the entry back to gpgcheck=1 once signed packages become available.

Quick Start

The quickest way to enable NPU access in containers:

1
2
3
4
5
6
7
8
# 1. Generate CDI specification (discovers RBLN libraries on host)
$ sudo rbln-ctk cdi generate

# 2. Configure your container runtime for CDI support
$ sudo rbln-ctk runtime configure

# 3. Run a container with NPU access (all NPUs)
$ docker run --device rebellions.ai/npu=all -it ubuntu:22.04

The toolkit auto-detects your runtime and applies the appropriate configuration. To expose only a subset of the host's NPUs, see Device Selection.

Verify Setup

1
2
3
4
5
6
7
8
# Check what was discovered on the host
$ rbln-ctk cdi list

# View detected runtime and configuration
$ rbln-ctk info

# Use NPU tools inside a container
$ docker run --device rebellions.ai/npu=all -it ubuntu:22.04 rbln-smi

Preview Before Applying

Every command supports --dry-run to see what would change without modifying anything:

$ rbln-ctk cdi generate --dry-run
$ rbln-ctk runtime configure --dry-run

Device Selection

The CDI spec exposes two forms of device handle. Pick the one that matches the workload's scope:

Handle Selects
rebellions.ai/npu=all Every NPU and every RSD group on the host (recommended)
rebellions.ai/npu=N A single NPU by index (0, 1, ...). The NPU's RSD group is attached automatically.

Use the =N handle to select a single NPU by index:

1
2
3
$ docker run \
  --device rebellions.ai/npu=N \
  -it IMAGE_NAME:TAG

Repeat --device to attach multiple NPUs. When the selected NPUs share an RSD group, the container can run tensor-parallel workloads across them:

1
2
3
4
$ docker run \
  --device rebellions.ai/npu=0 \
  --device rebellions.ai/npu=1 \
  -it IMAGE_NAME:TAG

On Kubernetes runtimes (containerd / CRI-O), only the =all handle is published — it carries the library mounts and the rbln-smi tool bind, with no device nodes attached. Per-Pod /dev/rbln* allocation is owned by the device plugin or the DRA driver. The per-NPU handle applies to standalone Docker hosts.

Regenerate the CDI spec after RSD topology changes

rbln-ctk cdi generate binds each =N handle in the CDI spec to the /dev/rsd* nodes present on the host at generate time. The automatic refresh paths (Systemd Integration, Automatic CDI Refresh) fire only on driver or toolkit library changes. After re-grouping NPUs with rbln-smi group -c, -a, or -d, rerun sudo rbln-ctk cdi generate or wait for the next driver refresh so the container runtime can resolve the new =N handles. The =all handle remains valid across RSD group changes.

Compatibility with npu=runtime

rebellions.ai/npu=runtime from v0.1.x keeps working as an alias of =all with identical content, so existing manifests and device-plugin builds do not need to be rewritten when upgrading to v0.2. Prefer =all or a per-device selector for new manifests; the alias may be retired in a future release once downstream consumers have migrated.

CLI Reference

rbln-ctk cdi generate

Discovers RBLN libraries and tools, then writes a CDI spec.

$ sudo rbln-ctk cdi generate
Flag Description Default
-o, --output Output path /var/run/cdi/rbln.yaml
-f, --format Output format (yaml or json) yaml
--driver-root Root path for driver files (CoreOS: /host) /
--container-library-path Isolated library path in container (same as host)
--dry-run Preview without writing false

rbln-ctk runtime configure

Auto-detects the active container runtime and enables CDI support.

$ sudo rbln-ctk runtime configure
Flag Description Default
-r, --runtime Force specific runtime (containerd, crio, docker) (auto-detect)
--config-path Custom runtime config path (runtime default)
--dry-run Preview changes false

rbln-ctk cdi list

Lists discovered RBLN libraries and tools.

$ rbln-ctk cdi list

rbln-ctk info

Displays system information including detected runtime and configuration.

$ rbln-ctk info

Kubernetes Deployment

For Kubernetes clusters, deploy the toolkit as a DaemonSet. The daemon (rbln-ctk-daemon) handles the entire lifecycle:

  1. Generates CDI spec on startup
  2. Configures the container runtime
  3. Restarts the container runtime
  4. Serves health check endpoints
  5. Cleans up on SIGTERM (pod termination)

Container Image

The official container image is available on Docker Hub:

$ docker pull rebellions/rbln-container-toolkit:latest

Deploy

$ kubectl apply -f deployments/kubernetes/daemonset.yaml

Health Endpoints

Endpoint Probe Type Returns 200 When
/live Liveness Daemon process is running
/ready Readiness Setup is complete
/startup Startup Initialization finished

Environment Variables

Variable Description Default
RBLN_CTK_DAEMON_RUNTIME Container runtime (auto-detect)
RBLN_CTK_DAEMON_HOST_ROOT Host root mount path / (host), /host (container)
RBLN_CTK_DAEMON_DRIVER_ROOT Driver root path for CDI spec /
RBLN_CTK_DAEMON_CDI_SPEC_DIR CDI spec directory /var/run/cdi
RBLN_CTK_DAEMON_CONTAINER_LIBRARY_PATH Container library path for library isolation (empty)
RBLN_CTK_DAEMON_SOCKET Runtime socket path (auto-detect)
RBLN_CTK_DAEMON_HEALTH_PORT Health check port 8080
RBLN_CTK_DAEMON_SHUTDOWN_TIMEOUT Graceful shutdown timeout 30s
RBLN_CTK_DAEMON_PID_FILE PID file path /run/rbln/toolkit.pid
RBLN_CTK_DAEMON_NO_CLEANUP_ON_EXIT Skip cleanup on exit false
RBLN_CTK_DAEMON_REFRESH_INTERVAL Poll interval for rbln-ctk-daemon's in-process CDI spec refresh (0 disables). DaemonSet-only; the DEB/RPM CLI path uses the systemd unit instead. See Automatic CDI Refresh. 60s
RBLN_CTK_DAEMON_DEBUG Enable debug logging false
RBLN_CTK_DAEMON_FORCE Terminate existing instance before starting false

Automatic CDI Refresh

rbln-ctk-daemon (the Kubernetes DaemonSet binary) drives the refresh loop. On DEB/RPM hosts, the systemd unit described in Systemd Integration handles the same job.

The daemon polls the rbln version: marker embedded in the host's RBLN UMD libraries. When the daemon detects a marker change, it regenerates /var/run/cdi/rbln.yaml. Because the daemon tracks library changes automatically, operators can reinstall or upgrade the driver while keeping the DaemonSet in place. The container runtime mounts the current library into each new container at startup.

RBLN_CTK_DAEMON_REFRESH_INTERVAL (or the --refresh-interval flag) controls the refresh period. The default is 60s; setting 0 disables polling. The daemon writes the spec to a temporary file, then swaps it in with fsync followed by rename, so the container runtime always reads a complete spec.

The /ready endpoint response includes a cdi-refresh block that reports the last run time, the number of libraries discovered, and the most recent refresh error (if any).

Kubernetes Pod Example

apiVersion: v1
kind: Pod
metadata:
  name: rbln-workload
spec:
  containers:
  - name: app
    image: ubuntu:22.04
    resources:
      limits:
        rebellions.ai/npu: "1"

CoreOS / OpenShift

For Red Hat CoreOS environments where the host filesystem is mounted at /host:

1
2
3
env:
  - name: RBLN_CTK_DAEMON_HOST_ROOT
    value: "/host"

Advanced Configuration

Library Isolation

By default, RBLN libraries are bind-mounted at their host paths inside the container. If this causes conflicts (e.g., different glibc versions), use library isolation:

$ sudo rbln-ctk cdi generate --container-library-path /rbln/lib64

This mode:

  • Mounts libraries to an isolated path (/rbln/lib64) instead of host paths
  • Uses the CDI hook to run ldconfig inside the container at startup
  • Avoids LD_LIBRARY_PATH — the ldcache handles library resolution natively
  • Supports setuid binaries (which ignore LD_LIBRARY_PATH)

Systemd Integration

The DEB and RPM packages install two systemd units under /usr/lib/systemd/system/ and the postinstall script enables them automatically:

  • rbln-cdi-refresh.path — watches librbln-ml.so under the standard library paths and the rbln-ctk binary
  • rbln-cdi-refresh.service — oneshot unit that runs rbln-ctk cdi generate --output /var/run/cdi/rbln.yaml

A driver re-install or a toolkit upgrade triggers the path unit, which runs the service unit and rewrites the CDI spec on the host. Newly started containers bind to the current libraries without any operator action.

Verify the path watcher is active:

$ systemctl status rbln-cdi-refresh.path

To turn the refresher off (for example, on a host that drives CDI regeneration via a custom workflow):

$ sudo systemctl disable --now rbln-cdi-refresh.path

This systemd path applies to DEB/RPM installs only. On Kubernetes, the equivalent loop runs in-process inside rbln-ctk-daemon — see Automatic CDI Refresh.

Configuration File

The toolkit reads configuration from /etc/rbln/container-toolkit.yaml.

All CLI flags can also be set via environment variables with the prefix RBLN_CTK_ (e.g., --driver-root becomes RBLN_CTK_DRIVER_ROOT).

Key configuration sections:

Section Controls
cdi Output path, format, vendor/class names
libraries Discovery patterns, plugin paths, container isolation path
tools Which CLI tools to include (e.g., rbln-smi)
search-paths Where to look for libraries and binaries
glibc-exclude System libraries to exclude from CDI spec
hooks CDI hook binary and ldconfig paths

Troubleshooting

CDI spec not generated

1
2
3
4
5
6
7
8
# Verify RBLN driver is installed
$ ls /usr/lib64/librbln-*.so*

# Run with debug output
$ rbln-ctk cdi generate --debug

# Check what was discovered
$ rbln-ctk cdi list

Container cannnot find RBLN libraries

1
2
3
4
5
# Verify hook is installed
$ ls -la /usr/local/bin/rbln-cdi-hook

# Regenerate CDI spec
$ sudo rbln-ctk cdi generate

Runtime not picking up changes

If the runtime is not recognizing CDI devices after configuration, try restarting it manually:

$ sudo systemctl restart containerd  # or crio, docker

Permission errors

Most operations require root access:

$ sudo rbln-ctk cdi generate
$ sudo rbln-ctk runtime configure

Next Steps

  • NPU Allocation — Learn how to allocate specific NPUs to containers using RSD groups