Skip to content

Quick Start

The RBLN SDK enables developers to run deep learning models efficiently on the RBLN Neural Processing Unit (NPU). This guide walks you through the complete workflow—from setup to inference—using practical examples with PyTorch and HuggingFace models. Follow these steps to get started:

  1. Setup & Installation
  2. Construct or Import a Model
  3. Compile a Model
  4. Model Inference
  5. Using the Model Serving Framework

1. Setup & Installation

System Requirements

  • Ubuntu 22.04 LTS (Debian bullseye) or higher
  • Python (supports 3.9 - 3.12)
  • A system equipped with an RBLN NPU
  • RBLN Driver

This tutorial assumes that the above system requirements have been met. You can check the RBLN Driver installation and RBLN NPU presence using the rbln-stat CLI as follows:

$ rbln-stat
+-------------------------------------------------------------------------------------------------+
|                                 Device Infomation KMD ver: X.X.XX                               |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| NPU |    Name   | Device    |   PCI BUS ID  | Temp |  Power  |    Memory(used/total)    |  Util |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| 0   | RBLN-CA12 | rbln0     |  0000:19:00.0 |  21C |   6.2W  |      0.0B / 15.7GiB      |   0.0 |
| 1   | RBLN-CA12 | rbln1     |  0000:1a:00.0 |  21C |   6.2W  |      0.0B / 15.7GiB      |   0.0 |
| 2   | RBLN-CA12 | rbln2     |  0000:1b:00.0 |  22C |   6.3W  |      0.0B / 15.7GiB      |   0.0 |
| 3   | RBLN-CA12 | rbln3     |  0000:1c:00.0 |  23C |   6.1W  |      0.0B / 15.7GiB      |   0.0 |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
+-------------------------------------------------------------------------------------------------+
|                                        Context Information                                       |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process             |     PID      | CTX | Priority | PTID |           Memalloc  | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| N/A | N/A                 |     N/A      | N/A |   N/A    | N/A  |                N/A  |  N/A   |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+

Install the RBLN SDK

RBLN SDK is distributed as a .whl package. Please note that rebel-compiler and vllm-rbln require an RBLN Portal account.

$ pip3 install --extra-index https://pypi.rbln.ai/simple/ rebel-compiler==0.7.3 vllm-rbln==0.7.3 optimum-rbln==0.7.3.post2

2. Construct or Import a Model

Before compiling a model for the RBLN NPU, you need to construct or import it using a supported deep learning framework. The RBLN SDK supports models from frameworks like tensorflow, torch, transformers, and diffusers. This section provides examples for a PyTorch model (non-HuggingFace) (Option 1) and a HuggingFace Diffusers model (Option 2).

Option 1: Non-HuggingFace Model

Non-HuggingFace models, such as custom PyTorch or TensorFlow models, are compiled using the rebel-compiler.

import torch
import torch.nn as nn

class SimpleConvBNRelu(nn.Module):
    def __init__(self, in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1):
        super(SimpleConvBNRelu, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

model = SimpleConvBNRelu()
print(model)

Option 2: HuggingFace Model

HuggingFace models, built with libraries like transformers or diffusers, are compiled using optimum-rbln, an RBLN-optimized extension of HuggingFace APIs.

from diffusers import StableDiffusionXLPipeline

model_id = "stabilityai/sdxl-turbo"

pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    export=True,
)

print(pipe)

3. Compile a Model

To run a model on the RBLN NPU, it must first be compiled into a format optimized for the hardware. The compilation function you use depends on whether your model integrates with the HuggingFace ecosystem (e.g., transformers or diffusers libraries). Use rebel-compiler for non-HuggingFace models, such as custom PyTorch or TensorFlow models, and optimum-rbln for models leveraging HuggingFace APIs. Refer to Option 1 for non-HuggingFace models or Option 2 for HuggingFace-compatible models.

Option 1: Non-HuggingFace Model

Non-HuggingFace models, such as custom PyTorch or TensorFlow models, are compiled using the rebel-compiler API. This tool converts the model into an RBLN NPU-compatible format. For PyTorch models, like the SimpleConvBNRelu in the example below, use the rebel.compile_from_torch() function:

import rebel
import torch
import torch.nn as nn

class SimpleConvBNRelu(nn.Module):
    def __init__(self, in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1):
        super(SimpleConvBNRelu, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

model = SimpleConvBNRelu().eval()

x = torch.rand(1, 3, 224, 224)

input_info = [
    ("x", list(x.shape), torch.float32),
]

compiled_model = rebel.compile_from_torch(model, input_info)
compiled_model.save("simple_conv_bn_relu.rbln")

Option 2: HuggingFace Model

HuggingFace models are compiled with optimum-rbln. The example below compiles a Stable Diffusion XL model using RBLNStableDiffusionXLPipeline(), which adapts the diffusers StableDiffusionXLPipeline class for the RBLN NPU. Set the export argument to True to enable compilation:

import os
from optimum.rbln import RBLNStableDiffusionXLPipeline

model_id = "stabilityai/sdxl-turbo"

pipe = RBLNStableDiffusionXLPipeline.from_pretrained(
    model_id,
    export=True,
    rbln_guidance_scale=0.0,
)

pipe.save_pretrained("rbln-sdxl-turbo")

4. Model Inference

Run inference on the RBLN NPU using a compiled model. The process depends on whether the model was compiled with rebel-compiler (non-HuggingFace) or optimum-rbln (HuggingFace), as outlined below.

Option 1: Non-HuggingFace Model

Non-HuggingFace models, compiled with rebel-compiler, use the rebel.Runtime() API to load and execute the model on the RBLN NPU.

import rebel
import torch

x = torch.rand(1, 3, 224, 224)

module = rebel.Runtime("simple_conv_bn_relu.rbln")
inputs = x.numpy()
result = module.run(inputs)

print("--- Input ---")
print(inputs)
print("--- Result ---")
print(result)

Option 2: HuggingFace Model

HuggingFace models, compiled with optimum-rbln, perform inference using the same optimum-rbln API. The example below runs inference on a Stable Diffusion XL model using RBLNStableDiffusionXLPipeline(), which adapts the diffusers StableDiffusionXLPipeline class for the RBLN NPU. Unlike compilation (see Option 2 of Section 3), set export=False and use the local path of the compiled model:

import os
from optimum.rbln import RBLNStableDiffusionXLPipeline

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

pipe = RBLNStableDiffusionXLPipeline.from_pretrained(
    model_id="rbln-sdxl-turbo",
    export=False,
)

image = pipe(prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

image.save("generated_img.png")

5. Using the Model Serving Framework

The RBLN SDK supports the following model serving frameworks to deploy compiled models in production environments, leveraging the RBLN NPU for efficient inference. Consult the linked documentation for detailed setup and configuration: