Skip to content

Device Monitoring (rbln-smi)

rbln-smi is a command-line interface (CLI) utility for monitoring and managing RBLN NPUs. It supports:

  • NPU status monitoring (temperature, power, utilization, memory)
  • Context and process inspection
  • System topology inspection
  • RSD group management and group-level settings

rbln-smi is included in the RBLN Driver package. For the full, version-specific option reference, run rbln-smi --help.

Note

rbln-stat is deprecated and replaced by rbln-smi. Existing scripts using rbln-stat may still work, but new users should use rbln-smi.

Quick Start

$ rbln-smi

Expected output (example)

Monitor output (example)
+-------------------------------------------------------------------------------------------------+
|                                 Device Information KMD ver: N/A                                 |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| NPU |    Name   | Device  |   PCI BUS ID  | Temp |  Power  | Perf |  Memory(used/total) |  Util |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| 0   | RBLN-CA12 | rbln0   |  0000:51:00.0 |  38C |  43.9W  | P2   |   2.4GB / 15.7GiB   |  98.7 |
| 1   | RBLN-CA12 | rbln1   |  0000:d8:00.0 |  25C |   6.1W  | P14  |    0.0B / 15.7GiB   |   0.0 |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
+-------------------------------------------------------------------------------------------------+
|                                       Context Information                                       |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process             |     PID      | CTX | Priority | PTID |            Memalloc | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| 0   | python3             |   2928727    |  1  |   min    |  0   |              1.9GiB |  run   |
| 0   | python3             |   2930166    |  2  |   min    |  1   |            468.0MiB |  idle  |
| 0   | python3             |   2934705    |  3  |   min    |  2   |             88.0MiB |  idle  |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+

Key Concepts and Terminology

Device selection

  • Use -d, --device <ids> to target specific NPUs (comma-separated list or range).
  • Output refers to device labels shown in the Device column (for example: rbln0, rbln1).

Output formats

  • Table (default): Human-readable summary for devices and contexts.
  • JSON (-j): Machine-readable output.
  • Query (-q): Space-separated (CSV-like) output suitable for scripts.

Common columns and performance state

The default monitor output typically includes:

Column Meaning
Name NPU product name (for example: RBLN-CA25).
Power Power consumption.
Perf Performance state (P-state).
Temp Temperature (°C).
Util Utilization.
PID Process ID.
CTX Context ID.
Memalloc Allocated memory.

For the authoritative product/family mapping and supported card list, see Support Matrix.

The Perf column reports a P-state. Common interpretations include:

P-state Clock (Neural Engine) PCIe Note
P2 Nominal Gen5 -
P4 Nominal Gen4 -
P6 Half Gen4 -
P10 Half (No update) Thermal Throttling
P12 Minimal (No update) System Abort (Hang)
P14 Off (No update) Idle

Command Reference

Note

Some operations require sudo. In particular, subcommands such as group, tdr, timeout, sort, and mknod require sudo.

General usage

Common invocation patterns:

$ rbln-smi [global options]
$ rbln-smi --topo [options]
$ sudo rbln-smi <subcommand> [arguments]

Global options

The following options are available across all command modes:

Option Description
-h, --help Display help information. Subcommand help requires sudo.
-b, --byte-format Display values in raw units instead of human-readable units.
-j, --json Render the result as JSON.
-q, --query Print data in a space-separated (CSV-like) format.
-qd, --query-device <columns> Select specific device columns when using query mode.
-qc, --query-context <columns> Select specific context columns when using query mode.
-t, --topo Show device/system topology (kernel 6.2 or later recommended).
-L, --list List NPUs and their UUIDs.
-d, --device <ids> Choose NPUs by comma-separated list or range.
-g, --group Display output organized by RSD groups.
-v, --version Print version information and exit.

CLI Examples

Basic commands

Summary

Default view. Shows a snapshot of device and context information.

Command

Command
$ rbln-smi [options]

Output (excerpt)

Monitor (excerpt)
Mon Nov 17 14:15:26 2025
+-------------------------------------------------------------------------------------------------+
|                        Device Information KMD ver: 2.1.0~dev.107+gafec0b9                       |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| NPU |    Name   | Device  |   PCI BUS ID  | Temp |  Power  | Perf |  Memory(used/total) |  Util |
+=====+===========+=========+===============+======+=========+======+=====================+=======+
| 0   | RBLN-CA25 | rbln0   |  0000:47:00.0 |  37C |  75.9W  |  P2  |  90.0MiB / 15.7GiB  |  50.0 |
| 1   |           | rbln1   |  0000:48:00.0 |  26C |         | P14  |    0.0B / 15.7GiB   |   0.0 |
+-------------------------------------------------------------------------------------------------+
|                                       Context Information                                       |
+-----+---------------------+--------------+-----------+----------+------+---------------+--------+
| NPU | Process             |     PID      |    CTX    | Priority | PTID |      Memalloc | Status |
+=====+=====================+==============+===========+==========+======+===============+========+
| 0   | command_submission  |    257082    |   10001   |   min    |  0   |       90.0MiB |  run   |

Summary

Prints the result in JSON format.

Command

Command
$ rbln-smi -j

Output (excerpt)

JSON output (excerpt)
{
  "KMD_version": "2.1.0~dev.107+gafec0b9",
  "devices": [
    {
      "npu": 0,
      "name": "RBLN-CA25",
      "sid": "SAMPLE_SID_0000",
      "uuid": "11111111-2222-3333-4444-555555555555",
      "device": "rbln0",
      "temperature": "24C",
      "card_power": "41448540uW",
      "pstate": "P14",
      "memory": { "used": "0", "total": "16877879296" },
      "util": "0.0"
    }
  ],
  "contexts": []
}

Summary

Prints the result in a space-separated (CSV-like) format.

Command

Command
$ rbln-smi -q

Output (excerpt)

Query output (excerpt)
driver_version:
  2.1.0~dev.107+gafec0b9
devices:
 npu      name        sid             uuid                                  device status ...
   0 RBLN-CA25 SAMPLE_SID_0000 11111111-2222-3333-4444-555555555555          rbln0 normal ...
   1 RBLN-CA25 SAMPLE_SID_0001 66666666-7777-8888-9999-AAAAAAAAAAAA          rbln1 normal ...

Summary

Restricts output to the specified NPUs only.

Command

Command
$ rbln-smi -d 0,1

Output (excerpt)

Device filter (excerpt)
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| NPU |    Name   | Device  |   PCI BUS ID  | Temp |  Power  | Perf |  Memory(used/total) |  Util |
+=====+===========+=========+===============+======+=========+======+=====================+=======+
| 0   | RBLN-CA25 | rbln0   |  0000:05:00.0 |  24C |  41.6W  | P14  |    0.0B / 15.7GiB   |   0.0 |
| 1   |           | rbln1   |  0000:06:00.0 |  25C |         | P14  |    0.0B / 15.7GiB   |   0.0 |
+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+

Summary

Prints device and system topology information, including the distance matrix.

Command

Command
$ rbln-smi --topo [--device <ids>]

Output (excerpt)

Topology (example)
Hardware Topology
Device Distance  n0  n1  n2  n3
rbln0        n0   0   4   4   4
rbln1        n1   4   0   4   4
rbln2        n2   4   4   0   4
rbln3        n3   4   4   4   0

Summary

Lists NPUs and their UUIDs.

Command

Command
$ rbln-smi -L

Output (excerpt)

-L (excerpt)
NPU 0: UUID: 11111111-2222-3333-4444-555555555555 (example)
NPU 1: UUID: 66666666-7777-8888-9999-AAAAAAAAAAAA (example)

Subcommands (sudo required)

Summary

Create or destroy RSD groups.

Command

Command
$ sudo rbln-smi group [-c <group_id> -a <npu_ids>] [-d <group_ids>]

Options

Argument Description
-c, --create <group_id> Create a new RSD group (use with -a). Specifying all assigns one group per device.
-a, --attach <npu_ids> Attach NPUs (comma-separated) to the new group.
-d, --destroy <group_ids> Remove one or more RSD groups. NPUs from a removed group are merged back into group 0. Accepts a single ID, or multiple IDs separated by , or : (the two separators may be mixed, e.g. 1,2:3). Specifying all removes every group.

Note

When multiple IDs are passed to --destroy, each group is processed independently. A failure on one ID does not stop the others; failed IDs are reported on stderr as skip destroying group <id>: <reason> and the command exits non-zero with a final failed to destroy group(s): <ids> message that lists every ID that did not complete.

Example session

Group workflow (example)
$ sudo rbln-smi group -c 1 -a 0,1
RSD group 1 created with devices [0,1].

$ rbln-smi --group
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| Grp | NPU |    Name   | Device  |   PCI BUS ID  | Temp |  Power  | Perf |  Memory(used/total) |  Util |
+=====+=====+===========+=========+===============+======+=========+======+=====================+=======+
| 1   | 0   | RBLN-CA25 | rbln0   |  0000:05:00.0 |  23C |  41.0W  | P14  |    0.0B / 15.7GiB   |   0.0 |
| 1   | 1   |           | rbln1   |  0000:06:00.0 |  25C |         | P14  |    0.0B / 15.7GiB   |   0.0 |
+=====+=====+===========+=========+===============+======+=========+======+=====================+=======+

$ sudo rbln-smi group -d 1
RSD group 1 destroyed.

# Destroy multiple groups in a single call (comma- or colon-separated).
$ sudo rbln-smi group -d 1,2,3
RSD group 1 destroyed.
RSD group 2 destroyed.
RSD group 3 destroyed.

$ sudo rbln-smi group -d 1:2:3
RSD group 1 destroyed.
RSD group 2 destroyed.
RSD group 3 destroyed.

# Per-ID aggregation: a failure on one ID does not block the rest.
$ sudo rbln-smi group -d 1,99,3
RSD group 1 destroyed.
skip destroying group 99: <reason>
RSD group 3 destroyed.
failed to destroy group(s): 99

$ sudo rbln-smi group -c all
All RSD groups created.

$ sudo rbln-smi group -d all
All RSD groups destroyed.

Summary

Enable or disable Timeout Detection and Recovery (TDR) for an RSD group. When TDR is enabled, the driver detects a hung job once the per-group timeout (see timeout) expires and runs an NPU reset cycle to recover the device; when disabled, an expired job is timed out but the NPU is not reset automatically.

Command

Command
$ sudo rbln-smi tdr -g <group_ids> -v <value>

Notes

<value> is 0 (disable) or 1 (enable); any non-zero value is treated as enable. The reset threshold itself is configured separately with the timeout subcommand. On success the command prints no output.

Output (example)

TDR set (example)
$ sudo rbln-smi tdr -g 1 -v 1
# (no output on success)

Summary

Set the TDR timeout — the time (in seconds) the driver waits for a job to complete before treating it as hung. The threshold applies per RSD group and is consumed by the TDR mechanism described under tdr.

Command

Command
$ sudo rbln-smi timeout -g <group_ids> -v <value>

Notes

<value> is a non-negative integer in seconds. 0 is a special value: the driver treats it as no timeout, so a job waits indefinitely. The reset cycle only runs when TDR is enabled — if tdr is 0, the timeout still trips but the NPU is not reset automatically. On success the command prints no output.

Output (example)

Timeout set (example)
$ sudo rbln-smi timeout -g 1 -v 10
# (no output on success)

Summary

Sort NPU devices by PCI BDF and rebind them.

Command

Command
$ sudo rbln-smi sort

Notes

On success, this command typically prints no output.

Output (example)

Sort (example)
$ sudo rbln-smi sort
# (no output on success)

Summary

Create /dev/rbln* character device nodes for the NPUs visible through the RSD control device (--rsd). Useful in containers or initramfs environments where udev does not populate /dev/rbln* automatically. Idempotent — pre-existing nodes are silently skipped.

Command

Command
$ sudo rbln-smi mknod [--rsd <path>] [--dev-dir <path>]

Options

Argument Description
--rsd <path> RSD control device used to enumerate the NPUs. Default /dev/rsd0.
--dev-dir <path> Directory in which to create the device nodes. Default /dev.

Notes

Nodes are created with mode 0666, matching the host's udev defaults. The process umask is cleared while the nodes are created and restored on exit, so other files in --dev-dir are not affected.

Inside containers the caller needs CAP_MKNOD — the Linux capability that authorizes mknod(2) for character and block special files (see capabilities(7)); for example, run the container with --cap-add=MKNOD. Other mknod(2) errors (for example, EPERM when CAP_MKNOD is missing) are reported on stderr and the command exits non-zero.

Output (example)

Mknod first run (example)
$ sudo rbln-smi mknod
created /dev/rbln0
created /dev/rbln1
2 device(s) processed (2 created).
Mknod re-run (example)
$ sudo rbln-smi mknod
2 device(s) processed (0 created).
Mknod with custom dev dir (example)
$ sudo rbln-smi mknod --dev-dir /run/rbln-dev
created /run/rbln-dev/rbln0
created /run/rbln-dev/rbln1
2 device(s) processed (2 created).

Troubleshooting

Permission denied / subcommand requires sudo

Run the command with sudo (for example: sudo rbln-smi group -h).

No devices are shown

  • Confirm the driver is installed and NPUs are detected.
  • Try rbln-smi -L to list devices.

Topology output is missing

If --topo is not available or incomplete, check kernel/version requirements and try again with specific devices via --device <ids>.

See also