Quick Start¶

The RBLN SDK enables developers to run deep learning models efficiently on the RBLN Neural Processing Unit (NPU). This guide walks you through the complete workflow—from setup to inference—using practical examples with PyTorch and HuggingFace models. Follow these steps to get started:

Setup & Installation
Construct or Import a Model
Compile a Model
Model Inference
Using the Model Serving Framework

1. Setup & Installation¶

System Requirements¶

Ubuntu 22.04 LTS (Debian bullseye) or higher
Python (supports 3.9 - 3.12)
A system equipped with an RBLN NPU
RBLN Driver

This tutorial assumes that the above system requirements have been met. You can check the RBLN Driver installation and RBLN NPU presence using the rbln-stat CLI as follows:

$ rbln-stat
+-------------------------------------------------------------------------------------------------+
|                                 Device Infomation KMD ver: X.X.XX                               |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| NPU |    Name   | Device    |   PCI BUS ID  | Temp |  Power  |    Memory(used/total)    |  Util |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| 0   | RBLN-CA12 | rbln0     |  0000:19:00.0 |  21C |   6.2W  |      0.0B / 15.7GiB      |   0.0 |
| 1   | RBLN-CA12 | rbln1     |  0000:1a:00.0 |  21C |   6.2W  |      0.0B / 15.7GiB      |   0.0 |
| 2   | RBLN-CA12 | rbln2     |  0000:1b:00.0 |  22C |   6.3W  |      0.0B / 15.7GiB      |   0.0 |
| 3   | RBLN-CA12 | rbln3     |  0000:1c:00.0 |  23C |   6.1W  |      0.0B / 15.7GiB      |   0.0 |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
+-------------------------------------------------------------------------------------------------+
|                                        Context Information                                       |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process             |     PID      | CTX | Priority | PTID |           Memalloc  | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| N/A | N/A                 |     N/A      | N/A |   N/A    | N/A  |                N/A  |  N/A   |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+

Install the RBLN SDK¶

RBLN SDK is distributed as a .whl package. Please note that rebel-compiler and vllm-rbln require an RBLN Portal account.

$ pip3 install --extra-index-url https://pypi.rbln.ai/simple/ rebel-compiler==0.8.1 vllm-rbln==0.8.1.post1 optimum-rbln==0.8.1

2. Construct or Import a Model¶

Before compiling a model for the RBLN NPU, you need to construct or import it using a supported deep learning framework. The RBLN SDK supports models from frameworks like tensorflow, torch, transformers, and diffusers. This section provides examples for a PyTorch model (non-HuggingFace) (Option 1) and a HuggingFace Diffusers model (Option 2).

Option 1: Non-HuggingFace Model¶

Non-HuggingFace models, such as custom PyTorch or TensorFlow models, are compiled using the rebel-compiler.

import torch
import torch.nn as nn

class SimpleConvBNRelu(nn.Module):
    def __init__(self, in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1):
        super(SimpleConvBNRelu, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

model = SimpleConvBNRelu()
print(model)

Option 2: HuggingFace Model¶

HuggingFace models, built with libraries like transformers or diffusers, are compiled using optimum-rbln, an RBLN-optimized extension of HuggingFace APIs.

from diffusers import StableDiffusionXLPipeline

model_id = "stabilityai/sdxl-turbo"

pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    export=True,
)

print(pipe)

3. Compile a Model¶

To run a model on the RBLN NPU, it must first be compiled into a format optimized for the hardware. The compilation function you use depends on whether your model integrates with the HuggingFace ecosystem (e.g., transformers or diffusers libraries). Use rebel-compiler for non-HuggingFace models, such as custom PyTorch or TensorFlow models, and optimum-rbln for models leveraging HuggingFace APIs. Refer to Option 1 for non-HuggingFace models or Option 2 for HuggingFace-compatible models.

Option 1: Non-HuggingFace Model¶

Non-HuggingFace models, such as custom PyTorch or TensorFlow models, are compiled using the rebel-compiler API. This tool converts the model into an RBLN NPU-compatible format. For PyTorch models, like the SimpleConvBNRelu in the example below, use the rebel.compile_from_torch() function:

import rebel
import torch
import torch.nn as nn

class SimpleConvBNRelu(nn.Module):
    def __init__(self, in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1):
        super(SimpleConvBNRelu, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

model = SimpleConvBNRelu().eval()

x = torch.rand(1, 3, 224, 224)

input_info = [
    ("x", list(x.shape), torch.float32),
]

compiled_model = rebel.compile_from_torch(model, input_info)
compiled_model.save("simple_conv_bn_relu.rbln")

Option 2: HuggingFace Model¶

HuggingFace models are compiled with optimum-rbln. The example below compiles a Stable Diffusion XL model using RBLNStableDiffusionXLPipeline(), which adapts the diffusers StableDiffusionXLPipeline class for the RBLN NPU. Set the export argument to True to enable compilation:

import os
from optimum.rbln import RBLNStableDiffusionXLPipeline

model_id = "stabilityai/sdxl-turbo"

pipe = RBLNStableDiffusionXLPipeline.from_pretrained(
    model_id,
    export=True,
    rbln_guidance_scale=0.0,
)

pipe.save_pretrained("rbln-sdxl-turbo")

4. Model Inference¶

Run inference on the RBLN NPU using a compiled model. The process depends on whether the model was compiled with rebel-compiler (non-HuggingFace) or optimum-rbln (HuggingFace), as outlined below.

Option 1: Non-HuggingFace Model¶

Non-HuggingFace models, compiled with rebel-compiler, use the rebel.Runtime() API to load and execute the model on the RBLN NPU.

import rebel
import torch

x = torch.rand(1, 3, 224, 224)

module = rebel.Runtime("simple_conv_bn_relu.rbln")
inputs = x.numpy()
result = module.run(inputs)

print("--- Input ---")
print(inputs)
print("--- Result ---")
print(result)

Option 2: HuggingFace Model¶

HuggingFace models, compiled with optimum-rbln, perform inference using the same optimum-rbln API. The example below runs inference on a Stable Diffusion XL model using RBLNStableDiffusionXLPipeline(), which adapts the diffusers StableDiffusionXLPipeline class for the RBLN NPU. Unlike compilation (see Option 2 of Section 3), set export=False and use the local path of the compiled model:

import os
from optimum.rbln import RBLNStableDiffusionXLPipeline

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

pipe = RBLNStableDiffusionXLPipeline.from_pretrained(
    model_id="rbln-sdxl-turbo",
    export=False,
)

image = pipe(prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

image.save("generated_img.png")

5. Using the Model Serving Framework¶

The RBLN SDK supports the following model serving frameworks to deploy compiled models in production environments, leveraging the RBLN NPU for efficient inference. Consult the linked documentation for detailed setup and configuration:

NVIDIA Triton Inference Server – a multi-model, multi-framework inference engine (refer to NVIDIA Triton Inference Server Documentation)
vllm-rbln – a high-performance inference engine for large language models (refer to vllm-rbln Documentation)
TorchServe – a framework for building, shipping, and running production-ready PyTorch models (refer to TorchServe Documentation)