PyTorch RBLN: A PyTorch extension for RBLN NPU¶

Warning

PyTorch RBLN is in beta and subject to change.

PyTorch RBLN(torch-rbln) is a PyTorch extension that allows natural use of the computing power of Rebellions NPU within PyTorch. By implementing eager mode, which operates in a ‘define-by-run’ fashion, it provides essential user experience across the entire lifecycle of model development, deployment, and serving in the PyTorch ecosystem. It is also convenient for various debugging scenarios.

We expect that PyTorch RBLN, with its ability to use the same interface as GPUs or CPUs, will provide a seamless experience for developers and customers who want to utilize Rebellions NPU in PyTorch.

How to install¶

You need an RBLN Portal account to install torch-rbln. To install the latest release via pip:

$ pip3 install -i https://pypi.rbln.ai/simple/ torch-rbln

For details on the latest version and changes, refer to the release notes.

Note

torch-rbln uses out-of-tree extension. Therefore, installing torch-rbln will automatically pull in torch as a dependency.

Design Overview¶

When you execute PyTorch ops like torch.add or torch.mul, they are bound to PyTorch ATen operators like aten::add or aten::mul inside PyTorch. By using the dispatcher, those ATen operations are linked to the target-specific implemenation in various backends like CPU, CUDA and XLA. For this, PyTorch dispatcher mainly uses the device attributes of the input tensors to find which backend serves the operation. Among the backends in PyTorch, there is PrivateUse1 dispatch key which is used by Out-of-Tree (OOT) extension for supporting various devices in PyTorch. PyTorch RBLN uses OOT to register RBLN device specific runtime and op implementation in PyTorch. Therefore, if you define input tensors with rbln device attribute and apply an operator to the tensors, it is dispatched to RBLN op implementation through PrivateUse1 dispatch key. The op implementation is mainly provided by torch.compile, which is the PyTorch interface to RBLN compiler for creating and running executable for RBLN device.

PyTorch RBLN components¶

PyTorch RBLN consists of the following four components: PyTorch, torch-rbln, RBLN Compiler & Runtime, and RBLN Driver.

PyTorch (torch)
- libtorch_python.so: Interface between Python and PyTorch C++ backend
- libtorch.so: Main library of PyTorch, providing PyTorch C++ APIs
- libtorch_cpu.so: LibTorch backend library for CPU
- libc10.so: Library for low level PyTorch utilities and basic structure
  - Tensor management, device abstraction, memory allocation etc.
PyTorch RBLN (torch-rbln)
- libtorch_rbln.so: ATen op C++ library (copy, resize utilizing RBLN Runtime API)
- register_ops.py: Python op implementations for Rebellions NPU (Using torch.compile)
- libc10_rbln.so: PyTorch runtime component library based on libc10.so
RBLN Compiler (rebel-compiler)
- librbln.so: Rebellions Compiler and Runtime Library
RBLN Driver
- librbln-thunk.so: Rebellions NPU device driver

Supported Ops¶

You can refer to the list of supported operations in here. The number of supported operations covered by the PyTorch RBLN will continuously expand as the RBLN SDK is updated.

Tutorials¶

To help users get started with PyTorch RBLN, we have created multiple comprehensive tutorials demonstrating its capabilities:

Basic: how to use PyTorch RBLN with PyTorch.
- Running and debugging with PyTorch RBLN
Advanced: running transformers model
- Running a LLM model: Llama3.2-1B