Skip to content

Troubleshoot

Cannot find librbln.so / librbln_runtime.so

This issue occurs when the RBLN runtime libraries installed with rebel-compiler are not visible to the dynamic loader in your current Python environment.

Typical message Package
RuntimeError: Cannot find libraries: ['librbln.so', 'librbln_runtime.so'] rebel-compiler
FileNotFoundError: Could not find librbln.so torch-rbln

To resolve the issue:

  1. Activate the same environment where rebel-compiler is installed. Confirm with pip show rebel-compiler or uv pip list | grep rebel.
  2. Check whether LD_LIBRARY_PATH or PYTHONPATH from another stack is affecting library search order. Prefer a clean shell or only your project venv.
  3. Reinstall rebel-compiler and torch-rbln according to the Installation page if the setup is unclear.

For diagnostics, run:

python -m torch_rbln.diagnose

If torch or rebel fails to import first, run:

TORCH_RBLN_DIAGNOSE=1 python -m torch_rbln.diagnose

The report shows LD_LIBRARY_PATH, PYTHONPATH, rebel-compiler package locations, and whether librbln.so / librbln_runtime.so can be found in each directory on the search path.

Collect a core dump file

If you encounter a problem while running PyTorch RBLN, send the generated core dump file to client_support@rebellions.ai.

Step 1. Remove the ulimit restriction:

$ ulimit -c unlimited

Step 2. Verify that the restriction has been removed:

$ ulimit -c
unlimited

Step 3. Re-run the affected model script. When the error occurs, a core dump file will be created under /var/crash.

Example output:

1
2
3
$ ls /var/crash
-rw-r----- 1 rebel1    root   779026 Jul  2 17:50 /var/crash/_usr_bin_python3.10.2029.crash
-rw-r----- 1 rebel2    root 94849351 Jun 25 18:27 /var/crash/_usr_bin_python3.10.2035.crash

Log operators that run on CPU

When PyTorch RBLN does not yet support a PyTorch operator or a specific data type, that operation runs on the CPU so execution can continue. This improves model compatibility, but those operations do not benefit from NPU acceleration, so identifying them is useful during optimization.

By default, the PyTorch RBLN log level is set to WARNING, so CPU fallback messages are not displayed. CPU fallback operations are logged at the INFO level. To identify operators that run on the CPU, set TORCH_RBLN_LOG_LEVEL to INFO or a more verbose setting.

Level Description Notes
DEBUG Detailed internal states, function entry/exit, parameter values Debug builds only
INFO Runtime information including CPU fallback notifications
WARNING Important warnings that may affect execution Default
ERROR Errors and critical failures only
$ export TORCH_RBLN_LOG_LEVEL=INFO

To restore the default setting:

$ export TORCH_RBLN_LOG_LEVEL=WARNING

With this setting, running a model in eager mode prints a log whenever an operation runs on the CPU instead of an RBLN NPU. The log includes the operator name and, if traceable, the source code location.

1
2
3
[2026-03-27 00:00:00.000][I] `aten::pow` op ran on CPU instead of RBLN
/transformers/models/llama/modeling_llama.py:73: UserWarning: TRACE
  variance = hidden_states.pow(2).mean(-1, keepdim=True)

Lower-than-expected memory statistics

You may see memory statistics APIs such as memory_allocated() or memory_stats() return lower values than expected immediately after creating tensors on the rbln device. Device memory usage can also appear to be zero or very low even after allocating large tensors.

This happens because RBLN tensors use lazy memory allocation:

  • Tensors are initially allocated in CPU memory when they are created.
  • Device memory allocation is deferred until the tensor is actually needed for device operations.
  • When a device operation is required, tensor data is transferred from CPU memory to device memory.
  • Memory-related APIs such as memory_allocated(), memory_reserved(), and memory_stats() reflect device memory only, not CPU memory.
  • Dynamo caching, used by torch.compile(), may also keep compiled graphs and associated device memory alive.

These low values are expected behavior rather than a bug, but they can make memory usage harder to interpret during debugging or performance analysis.

To check memory usage more accurately:

  • Inspect memory statistics after operations that materialize tensors on the device.
  • Reset the Dynamo cache before checking statistics if you want to exclude cached graph memory.
import torch
import torch.rbln

# Create tensors
x = torch.randn(1024, 1024, device="rbln")
y = torch.randn(1024, 1024, device="rbln")

# Memory stats immediately after creation may be low
print("After creation:", torch.rbln.memory_allocated() / 1024, "KB")

# Perform a device operation to materialize tensors
z = x + y

# Reset Dynamo cache to exclude cached graph memory from statistics
torch._dynamo.reset()

# Memory stats after materialization will reflect actual device memory usage
print("After operation:", torch.rbln.memory_allocated() / 1024, "KB")

Use torch._dynamo.reset() before checking memory statistics if you want to exclude cached graph memory and focus only on tensor memory usage.