빠른 시작¶

이 가이드는 RBLN SDK를 사용하여 RBLN NPU에서 딥러닝 모델을 실행하기 위한 워크플로우를 제공합니다. 이 튜토리얼은 다음의 과정들을 포함하고 있습니다:

설정 및 설치
모델 생성 또는 가져오기
모델 컴파일
모델 추론
모델 서빙 프레임워크 사용

1. 설정 및 설치¶

시스템 요구사항¶

Ubuntu 22.04 LTS (Debian bullseye) 이상
Python (3.9 - 3.12 지원)
RBLN NPU가 장착된 시스템
RBLN 드라이버

이 튜토리얼은 위의 시스템 요구사항이 준비된 시스템에서 테스트 된다고 가정합니다. 다음과 같이 rbln-stat 명령어를 사용하여 RBLN 드라이버 설치 여부 및 RBLN NPU의 존재 여부를 확인할 수 있습니다:

$ rbln-stat
+-------------------------------------------------------------------------------------------------+
|                                 Device Infomation KMD ver: X.X.XX                               |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| NPU |    Name   | Device    |   PCI BUS ID  | Temp |  Power  |    Memory(used/total)    |  Util |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
| 0   | RBLN-CA12 | rbln0     |  0000:19:00.0 |  21C |   6.2W  |      0.0B / 15.7GiB      |   0.0 |
| 1   | RBLN-CA12 | rbln1     |  0000:1a:00.0 |  21C |   6.2W  |      0.0B / 15.7GiB      |   0.0 |
| 2   | RBLN-CA12 | rbln2     |  0000:1b:00.0 |  22C |   6.3W  |      0.0B / 15.7GiB      |   0.0 |
| 3   | RBLN-CA12 | rbln3     |  0000:1c:00.0 |  23C |   6.1W  |      0.0B / 15.7GiB      |   0.0 |
+-----+-----------+-----------+---------------+------+---------+--------------------------+-------+
+-------------------------------------------------------------------------------------------------+
|                                        Context Infomation                                       |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| NPU | Process             |     PID      | CTX | Priority | PTID |           Memalloc  | Status |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+
| N/A | N/A                 |     N/A      | N/A |   N/A    | N/A  |                N/A  |  N/A   |
+-----+---------------------+--------------+-----+----------+------+---------------------+--------+

RBLN SDK 설치¶

RBLN SDK는 .whl 패키지로 배포됩니다. rebel-compiler와 vllm-rbln을 설치하기 위해 RBLN Portal 계정이 필요합니다.

$ pip3 install --extra-index-url https://pypi.rbln.ai/simple/ rebel-compiler==0.8.1 vllm-rbln==0.8.1.post1 optimum-rbln==0.8.1

2. 모델 생성 또는 가져오기¶

RBLN SDK는 tensorflow, torch, transformers, diffusers와 같은 다양한 딥러닝 프레임워크로 구축된 모델들을 RBLN NPU에서 실행시킬 수 있습니다. 이 튜토리얼에서는 PyTorch 모델(Non-HuggingFace 모델, 옵션 1)과 HuggingFace의 diffusers 모델(옵션 2)이 각각 어떻게 RBLN NPU에서 동작될 수 있는지를 보여줍니다.

옵션 1: Non-HuggingFace 모델¶

이 모델은 rebel-compiler를 사용하여 컴파일 할 수 있습니다.

import torch
import torch.nn as nn

class SimpleConvBNRelu(nn.Module):
    def __init__(self, in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1):
        super(SimpleConvBNRelu, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

model = SimpleConvBNRelu()
print(model)

옵션 2: HuggingFace 모델¶

이 모델은 optimum-rbln을 사용하여 컴파일 할 수 있습니다.

from diffusers import StableDiffusionXLPipeline

model_id = "stabilityai/sdxl-turbo"

pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    export=True,
)

print(pipe)

3. 모델 컴파일¶

모델 컴파일에 사용해야하는 RBLN SDK는 모델이 transformers 및 diffusers와 같은 HuggingFace API와 호환되는지에 따라 달라집니다. Non-HuggingFace 모델은 rebel-compiler를 사용하여 컴파일해야 하며, HuggingFace 모델은 optimum-rbln을 사용하여 컴파일해야 합니다. 모델이 HuggingFace API와 호환되지 않는 경우 옵션 1을 참고하고, 호환되는 경우 옵션 2를 참고하십시오.

옵션 1: Non-HuggingFace 모델¶

Non-HuggingFace 모델은 rebel-compiler API를 사용하여 컴파일할 수 있습니다. 아래의 예제가 PyTorch 모델이므로 rebel.compiled_from_torch()를 사용하여 쉽게 컴파일할 수 있습니다.

import rebel
import torch
import torch.nn as nn

class SimpleConvBNRelu(nn.Module):
    def __init__(self, in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1):
        super(SimpleConvBNRelu, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

model = SimpleConvBNRelu().eval()

x = torch.rand(1, 3, 224, 224)

input_info = [
    ("x", list(x.shape), torch.float32),
]

compiled_model = rebel.compile_from_torch(model, input_info)
compiled_model.save("simple_conv_bn_relu.rbln")

옵션 2: HuggingFace 모델¶

HuggingFace 모델은 optimum-rbln을 사용하여 컴파일할 수 있습니다. 다음 예제는 diffusers의 StableDiffusionXLPipeline() 클래스의 optimum-rbln 버전인 RBLNStableDiffusionXLPipeline()을 사용하여 StableDiffusionXL 모델을 컴파일하는 방법을 보여줍니다. 모델을 컴파일하려면 export 인자를 True로 설정해야 함을 참고하십시오.

import os
from optimum.rbln import RBLNStableDiffusionXLPipeline

model_id = "stabilityai/sdxl-turbo"

pipe = RBLNStableDiffusionXLPipeline.from_pretrained(
    model_id,
    export=True,
    rbln_guidance_scale=0.0,
)

pipe.save_pretrained("rbln-sdxl-turbo")

4. 모델 추론¶

컴파일된 모델을 RBLN NPU에 배포하여 추론을 수행합니다. 추론은 모델이 어떻게 컴파일되었는지(예: rebel-compiler 또는 optimum-rbln 사용여부)에 따라 아래의 두 가지 방법 중 하나로 진행됩니다.

옵션 1: Non-HuggingFace 모델¶

Non-HuggingFace 모델은 rebel-compiler의 rebel.Runtime() API를 사용하여 추론을 수행합니다.

import rebel
import torch

x = torch.rand(1, 3, 224, 224)

module = rebel.Runtime("simple_conv_bn_relu.rbln")
inputs = x.numpy()
result = module.run(inputs)

print("--- Input ---")
print(inputs)
print("--- Result ---")
print(result)

옵션 2: HuggingFace 모델¶

HuggingFace 모델은 optimum-rbln을 사용하여 추론을 수행합니다. 다음 예제는 diffusers의 StableDiffusionXLPipeline() 클래스의 optimum-rbln 버전인 RBLNStableDiffusionXLPipeline()을 사용하여 StableDiffusionXL 모델에서 추론을 수행하는 방법을 보여줍니다. 3장의 옵션 2에서 모델을 컴파일할 때와 동일하게 RBLNStableDiffusionXLPipeline() 클래스를 사용하지만 추론을 진행하기 위해 model_id에 컴파일된 모델이 저장된 로컬 경로를 입력하고 export 인자를 False로 설정합니다.

import os
from optimum.rbln import RBLNStableDiffusionXLPipeline

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

pipe = RBLNStableDiffusionXLPipeline.from_pretrained(
    model_id="rbln-sdxl-turbo",
    export=False,
)

image = pipe(prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

image.save("generated_img.png")

5. 모델 서빙 프레임워크 사용¶

아래는 RBLN SDK와 함께 사용할 수 있는 모델 서빙 프레임워크 목록입니다. 이를 활용하면 프로덕션 환경에 모델을 손쉽게 배포할 수 있습니다. 상세한 구성 및 고급 배포 옵션에 대해서는 해당 문서를 참고해주세요.

NVIDIA Triton Inference Server – 다중 모델, 다중 프레임워크 추론 엔진 (자세한 내용은 NVIDIA Triton Inference Server Documentation 참조)
vllm-rbln – 대규모 언어 모델을 위한 고성능 추론 엔진 (자세한 내용은 vllm-rbln Documentation 참조)
TorchServe – 프로덕션 준비된 PyTorch 모델을 구축, 배포 및 실행하기 위한 프레임워크 (자세한 내용은 TorchServe Documentation 참조)