Troubleshoot¶

Core issues and resolutions for Optimum RBLN.

Runtime creation fails after compilation¶

Failed to create RBLN runtime: ...

If you only need to compile the model without loading it to NPU, you can use:
  from_pretrained(..., rbln_create_runtimes=False) or
  from_pretrained(..., rbln_config={..., 'create_runtimes': False})

from_pretrained creates runtimes on the NPU immediately after compilation. Device memory exhaustion is not the only possible cause — inspect the original exception at the top of the message first. To skip runtime creation:

model = RBLNModel.from_pretrained(model_id, rbln_create_runtimes=False)
model.save_pretrained(save_dir)

Load the saved artifacts separately:

model = RBLNModel.from_pretrained(save_dir)

Device / num_devices configuration errors¶

Validation of device and num_devices may raise any of these errors:

1	`Device {device_id} is not a valid NPU device. Please check your NPU status with 'rbln-smi' command.`

1	`num_devices` ({N}) is greater than the number of available devices {M}.

1	The number of devices ({len_device}) does not match `num_devices` ({num_devices}).

Run rbln-smi to list available devices, then verify the following:

The specified device IDs exist on the system.
num_devices ≤ the number of available devices.
The device list length equals num_devices.

Flash attention configuration errors¶

Flash attention is enabled by setting attn_impl="flash_attn" or kvcache_partition_len. Compilation fails when any of the following constraints is violated:

1	`max_seq_len` ({X}) must be a multiple of `kvcache_partition_len` ({Y}) when using 'flash_attn'.

1	`kvcache_partition_len` ({X}) is out of the supported range (4096 <= kvcache_partition_len <= 32768).

1	`max_seq_len` ({X}) is too small for 'flash_attn'. The minimum supported value is 8192.

4,096 ≤ kvcache_partition_len ≤ 32,768
max_seq_len ≥ 8,192
max_seq_len must be a multiple of kvcache_partition_len and at least 2 × kvcache_partition_len
(e.g. kvcache_partition_len=16,384 requires max_seq_len ≥ 32,768)

Logging and debugging¶

Control log verbosity with the OPTIMUM_RBLN_VERBOSE environment variable (default info):

$ OPTIMUM_RBLN_VERBOSE=debug python inference.py    # detailed logs
$ OPTIMUM_RBLN_VERBOSE=warning python inference.py  # warnings and errors only

Supported levels: debug, info, warning, error, critical.

Topic guides¶

Multi-Module Models