Device Monitoring (rbln-smi)¶
rbln-smi is a command-line interface (CLI) utility for monitoring and managing RBLN NPUs. It supports:
- NPU status monitoring (temperature, power, utilization, memory)
- Context and process inspection
- System topology inspection
- RSD group management and group-level settings
rbln-smi is included in the RBLN Driver package. For the full, version-specific option reference, run rbln-smi --help.
Note
rbln-stat is deprecated and replaced by rbln-smi. Existing scripts using rbln-stat may still work, but new users should use rbln-smi.
Quick Start¶
Expected output (example)
+-------------------------------------------------------------------------------------------------+
| Device Information KMD ver: N/A |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| NPU | Name | Device | PCI BUS ID | Temp | Power | Perf | Memory(used/total) | Util |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| 0 | RBLN-CA12 | rbln0 | 0000:51:00.0 | 38C | 43.9W | P2 | 2.4GB / 15.7GiB | 98.7 |
| 1 | RBLN-CA12 | rbln1 | 0000:d8:00.0 | 25C | 6.1W | P14 | 0.0B / 15.7GiB | 0.0 |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
+-------------------------------------------------------------------------------------------------+
| Context Information |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process | PID | CTX | Priority | PTID | Memalloc | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| 0 | python3 | 2928727 | 1 | min | 0 | 1.9GiB | run |
| 0 | python3 | 2930166 | 2 | min | 1 | 468.0MiB | idle |
| 0 | python3 | 2934705 | 3 | min | 2 | 88.0MiB | idle |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
Key Concepts and Terminology¶
Device selection¶
- Use
-d, --device <ids>to target specific NPUs (comma-separated list or range). - Output refers to device labels shown in the
Devicecolumn (for example:rbln0,rbln1).
Output formats¶
- Table (default): Human-readable summary for devices and contexts.
- JSON (
-j): Machine-readable output. - Query (
-q): Space-separated (CSV-like) output suitable for scripts.
Common columns and performance state¶
The default monitor output typically includes:
| Column | Meaning |
|---|---|
Name |
NPU product name (for example: RBLN-CA25). |
Power |
Power consumption. |
Perf |
Performance state (P-state). |
Temp |
Temperature (°C). |
Util |
Utilization. |
PID |
Process ID. |
CTX |
Context ID. |
Memalloc |
Allocated memory. |
For the authoritative product/family mapping and supported card list, see Support Matrix.
The Perf column reports a P-state. Common interpretations include:
| P-state | Clock (Neural Engine) | PCIe | Note |
|---|---|---|---|
P2 |
Nominal | Gen5 | - |
P4 |
Nominal | Gen4 | - |
P6 |
Half | Gen4 | - |
P10 |
Half | (No update) | Thermal Throttling |
P12 |
Minimal | (No update) | System Abort (Hang) |
P14 |
Off | (No update) | Idle |
Command Reference¶
Note
Some operations require sudo. In particular, subcommands such as group, tdr, timeout, sort, and mknod require sudo.
General usage¶
Common invocation patterns:
Global options¶
The following options are available across all command modes:
| Option | Description |
|---|---|
-h, --help |
Display help information. Subcommand help requires sudo. |
-b, --byte-format |
Display values in raw units instead of human-readable units. |
-j, --json |
Render the result as JSON. |
-q, --query |
Print data in a space-separated (CSV-like) format. |
-qd, --query-device <columns> |
Select specific device columns when using query mode. |
-qc, --query-context <columns> |
Select specific context columns when using query mode. |
-t, --topo |
Show device/system topology (kernel 6.2 or later recommended). |
-L, --list |
List NPUs and their UUIDs. |
-d, --device <ids> |
Choose NPUs by comma-separated list or range. |
-g, --group |
Display output organized by RSD groups. |
-v, --version |
Print version information and exit. |
CLI Examples¶
Basic commands¶
Summary
Default view. Shows a snapshot of device and context information.
Command
Output (excerpt)
Mon Nov 17 14:15:26 2025
+-------------------------------------------------------------------------------------------------+
| Device Information KMD ver: 2.1.0~dev.107+gafec0b9 |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| NPU | Name | Device | PCI BUS ID | Temp | Power | Perf | Memory(used/total) | Util |
+=====+===========+=========+===============+======+=========+======+=====================+=======+
| 0 | RBLN-CA25 | rbln0 | 0000:47:00.0 | 37C | 75.9W | P2 | 90.0MiB / 15.7GiB | 50.0 |
| 1 | | rbln1 | 0000:48:00.0 | 26C | | P14 | 0.0B / 15.7GiB | 0.0 |
+-------------------------------------------------------------------------------------------------+
| Context Information |
+-----+---------------------+--------------+-----------+----------+------+---------------+--------+
| NPU | Process | PID | CTX | Priority | PTID | Memalloc | Status |
+=====+=====================+==============+===========+==========+======+===============+========+
| 0 | command_submission | 257082 | 10001 | min | 0 | 90.0MiB | run |
Summary
Prints the result in JSON format.
Command
Output (excerpt)
Summary
Prints the result in a space-separated (CSV-like) format.
Command
Output (excerpt)
Summary
Restricts output to the specified NPUs only.
Command
Output (excerpt)
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| NPU | Name | Device | PCI BUS ID | Temp | Power | Perf | Memory(used/total) | Util |
+=====+===========+=========+===============+======+=========+======+=====================+=======+
| 0 | RBLN-CA25 | rbln0 | 0000:05:00.0 | 24C | 41.6W | P14 | 0.0B / 15.7GiB | 0.0 |
| 1 | | rbln1 | 0000:06:00.0 | 25C | | P14 | 0.0B / 15.7GiB | 0.0 |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
Summary
Prints device and system topology information, including the distance matrix.
Command
Output (excerpt)
Subcommands (sudo required)¶
Summary
Create or destroy RSD groups.
Command
Options
| Argument | Description |
|---|---|
-c, --create <group_id> |
Create a new RSD group (use with -a). Specifying all assigns one group per device. |
-a, --attach <npu_ids> |
Attach NPUs (comma-separated) to the new group. |
-d, --destroy <group_ids> |
Remove one or more RSD groups. NPUs from a removed group are merged back into group 0. Accepts a single ID, or multiple IDs separated by , or : (the two separators may be mixed, e.g. 1,2:3). Specifying all removes every group. |
Note
When multiple IDs are passed to --destroy, each group is processed independently. A failure on one ID does not stop the others; failed IDs are reported on stderr as skip destroying group <id>: <reason> and the command exits non-zero with a final failed to destroy group(s): <ids> message that lists every ID that did not complete.
Example session
$ sudo rbln-smi group -c 1 -a 0,1
RSD group 1 created with devices [0,1].
$ rbln-smi --group
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| Grp | NPU | Name | Device | PCI BUS ID | Temp | Power | Perf | Memory(used/total) | Util |
+=====+=====+===========+=========+===============+======+=========+======+=====================+=======+
| 1 | 0 | RBLN-CA25 | rbln0 | 0000:05:00.0 | 23C | 41.0W | P14 | 0.0B / 15.7GiB | 0.0 |
| 1 | 1 | | rbln1 | 0000:06:00.0 | 25C | | P14 | 0.0B / 15.7GiB | 0.0 |
+=====+=====+===========+=========+===============+======+=========+======+=====================+=======+
$ sudo rbln-smi group -d 1
RSD group 1 destroyed.
# Destroy multiple groups in a single call (comma- or colon-separated).
$ sudo rbln-smi group -d 1,2,3
RSD group 1 destroyed.
RSD group 2 destroyed.
RSD group 3 destroyed.
$ sudo rbln-smi group -d 1:2:3
RSD group 1 destroyed.
RSD group 2 destroyed.
RSD group 3 destroyed.
# Per-ID aggregation: a failure on one ID does not block the rest.
$ sudo rbln-smi group -d 1,99,3
RSD group 1 destroyed.
skip destroying group 99: <reason>
RSD group 3 destroyed.
failed to destroy group(s): 99
$ sudo rbln-smi group -c all
All RSD groups created.
$ sudo rbln-smi group -d all
All RSD groups destroyed.
Summary
Enable or disable Timeout Detection and Recovery (TDR) for an RSD group. When TDR is enabled, the driver detects a hung job once the per-group timeout (see timeout) expires and runs an NPU reset cycle to recover the device; when disabled, an expired job is timed out but the NPU is not reset automatically.
Command
Notes
<value> is 0 (disable) or 1 (enable); any non-zero value is treated as enable. The reset threshold itself is configured separately with the timeout subcommand. On success the command prints no output.
Output (example)
Summary
Set the TDR timeout — the time (in seconds) the driver waits for a job to complete before treating it as hung. The threshold applies per RSD group and is consumed by the TDR mechanism described under tdr.
Command
Notes
<value> is a non-negative integer in seconds. 0 is a special value: the driver treats it as no timeout, so a job waits indefinitely. The reset cycle only runs when TDR is enabled — if tdr is 0, the timeout still trips but the NPU is not reset automatically. On success the command prints no output.
Output (example)
Summary
Sort NPU devices by PCI BDF and rebind them.
Command
Notes
On success, this command typically prints no output.
Output (example)
Summary
Create /dev/rbln* character device nodes for the NPUs visible through the RSD control device (--rsd). Useful in containers or initramfs environments where udev does not populate /dev/rbln* automatically. Idempotent — pre-existing nodes are silently skipped.
Command
Options
| Argument | Description |
|---|---|
--rsd <path> |
RSD control device used to enumerate the NPUs. Default /dev/rsd0. |
--dev-dir <path> |
Directory in which to create the device nodes. Default /dev. |
Notes
Nodes are created with mode 0666, matching the host's udev defaults. The process umask is cleared while the nodes are created and restored on exit, so other files in --dev-dir are not affected.
Inside containers the caller needs CAP_MKNOD — the Linux capability that authorizes mknod(2) for character and block special files (see capabilities(7)); for example, run the container with --cap-add=MKNOD. Other mknod(2) errors (for example, EPERM when CAP_MKNOD is missing) are reported on stderr and the command exits non-zero.
Output (example)
Troubleshooting¶
Permission denied / subcommand requires sudo¶
Run the command with sudo (for example: sudo rbln-smi group -h).
No devices are shown¶
- Confirm the driver is installed and NPUs are detected.
- Try
rbln-smi -Lto list devices.
Topology output is missing¶
If --topo is not available or incomplete, check kernel/version requirements and try again with specific devices via --device <ids>.
See also¶
rblnBandwidthLatencyTest: host-to-NPU and NPU-to-NPU performance benchmarkrbln-bios: system validationrbln-flash: firmware update tool