Performance

OVERVIEW
OVERVIEW
GETTING STARTED
GETTING STARTED
- Installation Guide
- Quick Links
TUTORIAL
TUTORIAL
- Basic
  Basic
- Advanced
  Advanced
  - Concurrent Processing
SOFTWARE
SOFTWARE
- API
  API
  - API Overview
  - Python API
  - Language Binding
    Language Binding
    
    C/C++
    C/C++
    
    Installation
    
    Tutorial
    Tutorial
    
    Image Classification
    
    Object Detection
    
    API
- HuggingFace Model Support
  HuggingFace Model Support
  - Optimum RBLN
  - Installation
  - Tutorial
    Tutorial
    
    SDXL-turbo (Image Generation)
    
    Llama2-7B (Chatbot)
  - Model API
- Model Serving
  Model Serving
  - RBLNServe
  - vLLM Support
    vLLM Support
    
    vllm-rbln
    
    Tutorial
    Tutorial
    
    vLLM Native API
    
    OpenAI Compatible Server
  - Nvidia Triton Inference Server Support
    Nvidia Triton Inference Server Support
    
    Nvidia Triton Inference Server
    
    Tutorial
    Tutorial
    
    Resnet50
    
    Llama2-7B with Continuous Batching
- RBLN Profiler
  RBLN Profiler
  - Overview
  - RBLN NPU Architecture
  - Profiling
  - Perfetto
    Perfetto
    
    Introduction
    
    How to Analyze
    
    Large Model Visualization
  - Examples
    Examples
    
    YOLOv8 (Object Detection)
    
    Stable Diffusion 3 (Image Generation)
    
    Llama3-8B (Text Generation)
- Others
  Others
  - Kubernetes Support
  - Tools
MISCELLANEOUS
MISCELLANEOUS
- Model Zoo
  Model Zoo
  - PyTorch
  - TensorFlow
- Supported OPs
  Supported OPs
  - PyTorch
  - TensorFlow
- Troubleshoot
ABOUT ATOM
ABOUT ATOM

Performance¶

Here, we are presenting a performance summary of the RBLN ATOM and comparing it to the GPUs and NPUs widely used for inference. All of the numbers described in this table are taken from the official MLPerf^TM Inference v3.0 results.

Vision / ResNet50

Vendor	Accelerator	Single Stream Latency	Relative Latency
Rebellions	RBLN ATOM	0.239 ms	x1.0
Qualcomm	Cloud AI100	0.336 ms	x1.4
Nvidia	A2 (Ampere)	0.713 ms	x3.0
Nvidia	T4 (Turing)	0.818 ms	x3.4

Language / BERT-Large

Vendor	Accelerator	Single Stream Latency	Relative Latency
Rebellions	RBLN ATOM	4.297 ms	x1.0
Qualcomm	Cloud AI100	7.547 ms	x1.8
Nvidia	A2 (Ampere)	8.506 ms	x2.0
Nvidia	T4 (Turing)	6.093 ms	x1.4

For more details, please visit the official MLPerf^TM Inference v3.0 website.