RBLN Container Toolkit¶
The RBLN Container Toolkit enables container runtimes to access Rebellions NPU devices using the Container Device Interface (CDI) specification. It automatically discovers host RBLN libraries and tools, generates CDI specs, and configures your container runtime, enabling containers to use NPU hardware with zero application changes.
Scope
The Container Toolkit is currently responsible only for CDI spec generation and runtime configuration. It exposes RBLN libraries and tools (such as rbln-smi) to containers via CDI. For NPU device allocation using RSD groups, see the NPU Allocation guide. These features will be integrated into a unified toolkit in a future release.
How It Works¶
The toolkit provides three binaries:
| Binary | Role |
|---|---|
rbln-ctk |
Main CLI — generates CDI specs, configures runtimes, inspects the system |
rbln-ctk-daemon |
Kubernetes daemon — automated setup with health endpoints and graceful shutdown |
rbln-cdi-hook |
OCI hook — runs inside containers to update ldcache and create symlinks |
The toolkit ships in two distribution flavors with intentionally different runtime profiles:
| Flavor | Target | Device injection | Refresh path |
|---|---|---|---|
| DEB / RPM package | Standalone Docker hosts | rbln-ctk injects /dev/rbln* and the matching /dev/rsd* group node via CDI. NPU↔RSD mapping is resolved through librbln-ml, which the package pulls in as a runtime dependency. |
Host-side rbln-cdi-refresh.path systemd unit, auto-enabled by postinstall. See Systemd Integration. |
| Container image | Kubernetes DaemonSet | rbln-ctk-daemon skips device-node emission — device-plugin (or DRA) owns per-Pod /dev/rbln* allocation. Because the NPU↔RSD resolver is never invoked in this mode, the image is built without a librbln-ml dependency. |
In-process loop inside rbln-ctk-daemon. See Automatic CDI Refresh. |
Prerequisites¶
| OS | Architecture | Container Runtime |
|---|---|---|
| Ubuntu 22.04/24.04 | x86_64 | containerd, CRI-O, Docker |
| RHEL 9+ | x86_64 | containerd, CRI-O, Docker |
- RBLN driver — install it on the host before the container toolkit. The toolkit's DEB / RPM package declares the driver's UMD library as a hard runtime dependency (
librbln-ml3on Debian/Ubuntu,librbln-mlon RHEL-family hosts), so installing the toolkit on a host without the driver fails with an unmet-dependency error fromapt-getordnf.
Installation¶
Ubuntu / Debian¶
-
Add the Rebellions official GPG key (skip if already configured):
-
Add the repository to APT sources (skip if already configured):
-
Update APT and install:
RHEL¶
-
Register the Rebellions yum repository (skip if already configured):
-
Refresh the cache and install:
Unsigned RPM packages
The current RPM packages are not yet signed, so gpgcheck=0 is required to install them. Switch the entry back to gpgcheck=1 once signed packages become available.
Quick Start¶
The quickest way to enable NPU access in containers:
The toolkit auto-detects your runtime and applies the appropriate configuration. To expose only a subset of the host's NPUs, see Device Selection.
Verify Setup¶
Preview Before Applying¶
Every command supports --dry-run to see what would change without modifying anything:
Device Selection¶
The CDI spec exposes two forms of device handle. Pick the one that matches the workload's scope:
| Handle | Selects |
|---|---|
rebellions.ai/npu=all |
Every NPU and every RSD group on the host (recommended) |
rebellions.ai/npu=N |
A single NPU by index (0, 1, ...). The NPU's RSD group is attached automatically. |
Use the =N handle to select a single NPU by index:
Repeat --device to attach multiple NPUs. When the selected NPUs share an RSD group, the container can run tensor-parallel workloads across them:
On Kubernetes runtimes (containerd / CRI-O), only the =all handle is published — it carries the library mounts and the rbln-smi tool bind, with no device nodes attached. Per-Pod /dev/rbln* allocation is owned by the device plugin or the DRA driver. The per-NPU handle applies to standalone Docker hosts.
Regenerate the CDI spec after RSD topology changes
rbln-ctk cdi generate binds each =N handle in the CDI spec to
the /dev/rsd* nodes present on the host at generate time. The
automatic refresh paths
(Systemd Integration,
Automatic CDI Refresh) fire only on
driver or toolkit library changes. After re-grouping NPUs with
rbln-smi group -c, -a, or -d, rerun
sudo rbln-ctk cdi generate or wait for the next driver refresh
so the container runtime can resolve the new =N handles. The
=all handle remains valid across RSD group changes.
Compatibility with npu=runtime
rebellions.ai/npu=runtime from v0.1.x keeps working as an alias of =all with identical content, so existing manifests and device-plugin builds do not need to be rewritten when upgrading to v0.2. Prefer =all or a per-device selector for new manifests; the alias may be retired in a future release once downstream consumers have migrated.
CLI Reference¶
rbln-ctk cdi generate¶
Discovers RBLN libraries and tools, then writes a CDI spec.
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output path | /var/run/cdi/rbln.yaml |
-f, --format |
Output format (yaml or json) |
yaml |
--driver-root |
Root path for driver files (CoreOS: /host) |
/ |
--container-library-path |
Isolated library path in container | (same as host) |
--dry-run |
Preview without writing | false |
rbln-ctk runtime configure¶
Auto-detects the active container runtime and enables CDI support.
| Flag | Description | Default |
|---|---|---|
-r, --runtime |
Force specific runtime (containerd, crio, docker) |
(auto-detect) |
--config-path |
Custom runtime config path | (runtime default) |
--dry-run |
Preview changes | false |
rbln-ctk cdi list¶
Lists discovered RBLN libraries and tools.
rbln-ctk info¶
Displays system information including detected runtime and configuration.
Kubernetes Deployment¶
For Kubernetes clusters, deploy the toolkit as a DaemonSet. The daemon (rbln-ctk-daemon) handles the entire lifecycle:
- Generates CDI spec on startup
- Configures the container runtime
- Restarts the container runtime
- Serves health check endpoints
- Cleans up on SIGTERM (pod termination)
Container Image¶
The official container image is available on Docker Hub:
Deploy¶
Health Endpoints¶
| Endpoint | Probe Type | Returns 200 When |
|---|---|---|
/live |
Liveness | Daemon process is running |
/ready |
Readiness | Setup is complete |
/startup |
Startup | Initialization finished |
Environment Variables¶
| Variable | Description | Default |
|---|---|---|
RBLN_CTK_DAEMON_RUNTIME |
Container runtime | (auto-detect) |
RBLN_CTK_DAEMON_HOST_ROOT |
Host root mount path | / (host), /host (container) |
RBLN_CTK_DAEMON_DRIVER_ROOT |
Driver root path for CDI spec | / |
RBLN_CTK_DAEMON_CDI_SPEC_DIR |
CDI spec directory | /var/run/cdi |
RBLN_CTK_DAEMON_CONTAINER_LIBRARY_PATH |
Container library path for library isolation | (empty) |
RBLN_CTK_DAEMON_SOCKET |
Runtime socket path | (auto-detect) |
RBLN_CTK_DAEMON_HEALTH_PORT |
Health check port | 8080 |
RBLN_CTK_DAEMON_SHUTDOWN_TIMEOUT |
Graceful shutdown timeout | 30s |
RBLN_CTK_DAEMON_PID_FILE |
PID file path | /run/rbln/toolkit.pid |
RBLN_CTK_DAEMON_NO_CLEANUP_ON_EXIT |
Skip cleanup on exit | false |
RBLN_CTK_DAEMON_REFRESH_INTERVAL |
Poll interval for rbln-ctk-daemon's in-process CDI spec refresh (0 disables). DaemonSet-only; the DEB/RPM CLI path uses the systemd unit instead. See Automatic CDI Refresh. |
60s |
RBLN_CTK_DAEMON_DEBUG |
Enable debug logging | false |
RBLN_CTK_DAEMON_FORCE |
Terminate existing instance before starting | false |
Automatic CDI Refresh¶
rbln-ctk-daemon (the Kubernetes DaemonSet binary) drives the refresh loop. On DEB/RPM hosts, the systemd unit described in Systemd Integration handles the same job.
The daemon polls the rbln version: marker embedded in the host's RBLN UMD libraries. When the daemon detects a marker change, it regenerates /var/run/cdi/rbln.yaml. Because the daemon tracks library changes automatically, operators can reinstall or upgrade the driver while keeping the DaemonSet in place. The container runtime mounts the current library into each new container at startup.
RBLN_CTK_DAEMON_REFRESH_INTERVAL (or the --refresh-interval flag) controls the refresh period. The default is 60s; setting 0 disables polling. The daemon writes the spec to a temporary file, then swaps it in with fsync followed by rename, so the container runtime always reads a complete spec.
The /ready endpoint response includes a cdi-refresh block that reports the last run time, the number of libraries discovered, and the most recent refresh error (if any).
Kubernetes Pod Example¶
CoreOS / OpenShift¶
For Red Hat CoreOS environments where the host filesystem is mounted at /host:
Advanced Configuration¶
Library Isolation¶
By default, RBLN libraries are bind-mounted at their host paths inside the container. If this causes conflicts (e.g., different glibc versions), use library isolation:
This mode:
- Mounts libraries to an isolated path (
/rbln/lib64) instead of host paths - Uses the CDI hook to run
ldconfiginside the container at startup - Avoids
LD_LIBRARY_PATH— the ldcache handles library resolution natively - Supports setuid binaries (which ignore
LD_LIBRARY_PATH)
Systemd Integration¶
The DEB and RPM packages install two systemd units under /usr/lib/systemd/system/ and the postinstall script enables them automatically:
rbln-cdi-refresh.path— watcheslibrbln-ml.sounder the standard library paths and therbln-ctkbinaryrbln-cdi-refresh.service— oneshot unit that runsrbln-ctk cdi generate --output /var/run/cdi/rbln.yaml
A driver re-install or a toolkit upgrade triggers the path unit, which runs the service unit and rewrites the CDI spec on the host. Newly started containers bind to the current libraries without any operator action.
Verify the path watcher is active:
To turn the refresher off (for example, on a host that drives CDI regeneration via a custom workflow):
This systemd path applies to DEB/RPM installs only. On Kubernetes, the equivalent loop runs in-process inside rbln-ctk-daemon — see Automatic CDI Refresh.
Configuration File¶
The toolkit reads configuration from /etc/rbln/container-toolkit.yaml.
All CLI flags can also be set via environment variables with the prefix RBLN_CTK_ (e.g., --driver-root becomes RBLN_CTK_DRIVER_ROOT).
Key configuration sections:
| Section | Controls |
|---|---|
cdi |
Output path, format, vendor/class names |
libraries |
Discovery patterns, plugin paths, container isolation path |
tools |
Which CLI tools to include (e.g., rbln-smi) |
search-paths |
Where to look for libraries and binaries |
glibc-exclude |
System libraries to exclude from CDI spec |
hooks |
CDI hook binary and ldconfig paths |
Troubleshooting¶
CDI spec not generated¶
Container cannnot find RBLN libraries¶
Runtime not picking up changes¶
If the runtime is not recognizing CDI devices after configuration, try restarting it manually:
Permission errors¶
Most operations require root access:
Next Steps¶
- NPU Allocation — Learn how to allocate specific NPUs to containers using RSD groups