Tutorials¶
Explore the following tutorials for a better understanding on how to use the RBLN SDK.
RBLN Compiler¶
These tutorials demonstrate how to use RBLN Python API (Compile and Runtime APIs) for PyTorch and TensorFlow models.
-
PyTorch Resnet50 (Vision)
Using TorchVision library with RBLN Compile API with aResNet50
example. -
PyTorch BERT (NLP)
Using PyTorch with RBLN Compile API with aBERT-base
example. -
TensorFlow EfficientNetB0 (Vision)
Using TF Keras Applications library with the RRBLN Compile API with anEfficientNet-B0
example. -
TensorFlow BERT (NLP)
Using TensorFlow with RBLN Compile APIs with aBERT-base
example. -
Concurrent Processing
Executing AI models asynchronously with RBLN Runtime API (AsyncRuntime
).
RBLN C/C++ Runtime API¶
These tutorials demonstrate how to deploy precompiled models using RBLN C/C++ Runtime API.
-
Image Classification
Deploying the PyTorchResNet50
model with RBLN C/C++ Runtime API. -
Object Detection
Deploying the PyTorchYOLOv8
model with RBLN C/C++ Runtime API.
Huggingface Model Support¶
These tutorials demonstrate how to compile and run inference on HuggingFace models using optimum-rbln
.
-
SDXL-Turbo (Image Generation)
Compiling and deployingSDXL-turbo
and generating images usingoptimum-rbln
. -
Llama2-7B (Chatbot)
Compiling and deployingLlama2-7b
models usingoptimum-rbln
across multiple RBLN NPUs.
Model Profiling¶
These tutorials demonstrate how to profile and analyze models during inference using RBLN Runtime.
-
Model Profiling
Profiling inference models using RBLN Runtime. -
Analyze with Perfetto
Analyzing the profiled results using the visualization tool, Perfetto. -
YOLOv8 (Object Detection)
ProfilingYOLOv8l
and analyzing results with Perfetto. -
Stable Diffusion 3 (Image Generation)
Profilingstable-diffusion-3-text-to-image
and analyzing results with Perfetto. -
Llama3-8B (Text Generation)
ProfilingLlama3-8B
and analyzing results with Perfetto.
Model Serving¶
These tutorials demonstrate how to serve precompiled AI models using Nvidia Triton Inference Server and TorchServe, both of which support vLLM.
-
vLLM-Native API
Serving various LLMs using vLLM-native APIs withvllm-rbln
. -
OpenAI Compatible Server
Serving vLLM-based LLMs using OpenAI Compatible Server. -
Nvidia Triton Inference Server
Serving LLMs using Nvidia Triton Inference Server