Troubleshoot¶
Core issues and resolutions for Optimum RBLN.
Runtime creation fails after compilation¶
from_pretrained creates runtimes on the NPU immediately after compilation. Device memory exhaustion is not the only possible cause — inspect the original exception at the top of the message first. To skip runtime creation:
Load the saved artifacts separately:
Device / tensor parallel configuration errors¶
Validation of device and tensor_parallel_size may raise any of these errors:
Run rbln-smi to list available devices, then verify the following:
- The specified device IDs exist on the system.
tensor_parallel_size≤ the number of available devices.- The
devicelist length equalstensor_parallel_size.
Flash attention configuration errors¶
Flash attention is enabled by setting attn_impl="flash_attn" or kvcache_partition_len. Compilation fails when any of the following constraints is violated:
4,096 ≤ kvcache_partition_len ≤ 32,768max_seq_len ≥ 8,192max_seq_lenmust be a multiple ofkvcache_partition_lenand at least2 × kvcache_partition_len
(e.g.kvcache_partition_len=16,384requiresmax_seq_len ≥ 32,768)
Logging and debugging¶
Control log verbosity with the OPTIMUM_RBLN_VERBOSE environment variable (default info):
Supported levels: debug, info, warning, error, critical.