PyTorch RBLN: A PyTorch extension for RBLN NPUs¶
PyTorch RBLN beta status
PyTorch RBLN is currently in beta and is actively developed. APIs may change or be removed between releases, and backward compatibility is not guaranteed. Supported operations are currently limited. We do not recommend using this integration in production workloads yet. We encourage you to try it and share feedback to help us stabilize it for general availability.
Overview¶
PyTorch RBLN (torch-rbln) is a PyTorch extension that enables RBLN NPUs to be used directly from PyTorch. By supporting eager (define-by-run) execution, it preserves a familiar workflow across model development, deployment, and serving in the PyTorch ecosystem, and it is also useful for debugging.
With the same device-oriented programming model used for CPUs and GPUs, PyTorch RBLN is designed to provide a seamless way to use RBLN NPUs from standard PyTorch code. This page explains how execution reaches the device, what software layers make that possible, and where to go next for installation, supported operators, and troubleshooting.
Execution flow¶
When a PyTorch operation runs, the dispatcher selects the backend implementation based on tensor device metadata. For tensors on the rbln device, the dispatcher routes execution through PrivateUse1 and then proceeds through torch.compile to the RBLN Compiler.
- User-facing operations such as
torch.addresolve to ATen operators such asaten::add. - The PyTorch dispatcher selects the implementation for the target backend and routes
rblntensors throughPrivateUse1. PrivateUse1is the dispatch key reserved for Out-of-Tree (OOT) extensions and used by third-party backends such as RBLN.torch.compileserves as the entry point to the RBLN path.- The RBLN Compiler handles compilation and execution for the RBLN path.
- The RBLN Driver provides the low-level device interface.
Software stack¶
The runtime flow above shows execution order. The stack below shows the same system as software layers: PyTorch, torch-rbln, the RBLN Compiler, and the RBLN Driver. The compiler layer includes runtime components for compilation and execution, while the driver provides the low-level hardware interface to RBLN NPUs.
The diagram uses short labels only. Each box corresponds to one row in the table below, in the same top-to-bottom order within each layer. Use the Library / file column for concrete artifacts and the Role column for fuller descriptions.
| Layer | Library / file | Role |
|---|---|---|
PyTorch (torch) |
libtorch_python.so |
Interface between Python and the PyTorch C++ backend |
libtorch.so |
Main PyTorch library; provides PyTorch C++ APIs | |
libtorch_cpu.so |
LibTorch CPU backend | |
libc10.so |
Low-level PyTorch utilities and core structures (tensor management, device abstraction, memory allocation, etc.) | |
PyTorch RBLN (torch-rbln) |
libtorch_rbln.so |
ATen operator C++ library for RBLN runtime calls such as copy and resize |
register_ops.py |
Python operator implementations for RBLN NPUs (via torch.compile) |
|
libc10_rbln.so |
RBLN C10 extension: extends libc10 for RBLN device types and related hooks |
|
RBLN Compiler (rebel-compiler) |
librbln.so |
Compiler library with runtime components for RBLN NPUs |
| RBLN Driver | librbln-thunk.so |
Low-level device / hardware interface for RBLN NPUs |
Next steps¶
- Installation — Install from pre-built wheels or build from source.
- Tutorials — Learn the basic workflow and try example models.
- Supported ops — Check the current operator coverage.
- Troubleshoot — Resolve common setup and runtime issues.