릴리스 노트¶
릴리스 노트의 각 변경사항은 명확한 이해를 위해 영문으로 작성되어 있습니다.
2025.03.28¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo |
|---|---|---|---|---|---|
| 2025.03.28.0 | v1.2.92 | v0.7.3 | v0.7.3.post2 | v0.7.3 | v0.5.8 |
-
Install Command
-
RBLN Compiler:- Features
- Added support for Paged Attention
- Expanded the coverage of the LayerNorm operation
- Optimization
- Improved memory tiling algorithm for better
Neural Engine Clusters' utilization
- Improved memory tiling algorithm for better
- BugFix
- Fixed an issue where the RBLN_NUM_THREADS environment variable did not work with the
AsyncRuntime
- Fixed an issue where the RBLN_NUM_THREADS environment variable did not work with the
- API
- Added a context manager, the
profileclass, to support saving profiled data to a user-defined path for RBLN Profiler
- Added a context manager, the
- Features
-
Optimum RBLN:- Added functions:
RBLNKandinskyV22Pipeline()RBLNKandinskyV22Img2ImgPipeline()RBLNKandinskyV22CombinedPipeline()RBLNKandinskyV22Img2ImgCombinedPipeline()
- BugFix:
- Removed the ad-hoc forward statement in
RBLNXLMRobertaModel()to fix "the number of inputs exceeds the expected number" error for BGE-M3 model
- Removed the ad-hoc forward statement in
- Features:
- Updated to support Paged Attention
- Note: Inference with the
generate()API cannot be completed when no memory blocks are available for allocation due to the absence of the Paged Attention block manager. This issue can be resolved by usingvLLM RBLN, which fully supports Paged Attention
- Note: Inference with the
- Updated to support Paged Attention
- Others:
- The usage of
class property, which had been deprecated since Python 3.11, had been removed from the codebase to align with Python's deprecation schedule
- The usage of
- Added functions:
-
vLLM RBLN:- Updated to support Paged Attention
-
RBLN Model Zoo:- Added new models:
- HuggingFace
- EXAONE-3.5-32b
- BGE-Small-en-v1.5
- BGE-Base-en-v1.5
- BGE-Large-en-v1.5
- Kandinsky v2.2
- Text2Image
- Image2Image
- HuggingFace
- Updated to restructure the directories for the HuggingFace models, differentiating between the
diffusersandtransformersmodels - Changed the default input image size from 512 to 768 for knadinsky v2.2 Inpaint examples
- Added new models:
-
RBLNServe:Deprecation Notice: As announced in the RBLN SDK 2025.02.28 release,RBLNServehas been deprecated and removed in this release. Please transition to alternative solutions such as Nvidia Triton Inference Server, vLLM, or TorchServe
-
Other Changes:- SDK Documentation:
- Added a Quick Start page as part of a tutorial to help users understand how to use the RBLN SDK
- Added an Error Codes page that provides explanations of error codes encountered during model compilation, including the affected compilation pass and their underlying causes
- Added a Performance Tuning section to the Software > RBLN Compiler > Troubleshoot
- Added tutorials using Flash Attention:
- vLLM RBLN
- Nvidia Triton Inference Server
- TorchServe
- Updated the tutorials that demonstrate using RBLN SDK with the Llama2-7B to now use Llama3-8B:
- Optimum RBLN
- vLLM RBLN
- Nvidia Triton Inference Server
- SDK Documentation:
2025.02.28¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2025.02.28.0 | v1.2.92 | v0.7.2 | v0.7.2 | v0.7.2 | v0.5.7 | v0.5.0 |
-
Install Command
-
RBLN Compiler:- Features
- Improved the attention kernel to support multiple batches
- Added new supported operations
depth_to_spaceConsineSimilarityMaxUnpool2dpolyvalmirror_pad
- Optimization
- Optimized prefill performance of the flash attention supported models
- Improved the internal buffer management algorithm for faster model build time
- Enhanced LLM attention mask transfer logic for RSD models
- API
- Added debugging functions for
RuntimeBaseclassget_elasped_times()- Calculate the average total execution time of operations in microseconds
get_elasped_device_times()- Calculate the average device-side execution time of operations in microseconds
get_reports()- Retrieve all pending reports from the runtime
flush_reports()- Fetch and discard all pending reports from the runtime
- Added a new class method
inspect()forRBLNCompiledModelclass- Provides metadata information about the compiled model without loading it into host memory
- Added debugging functions for
- Features
-
Optimum RBLN:- Added functions:
- RBLNBertForMaskedLM()
- RBLNKandinskyV22InpaintCombinedPipeline()
- RBLNKandinskyV22InpaintPipeline()
- RBLNKandinskyV22PriorPipeline()
- BugFix:
- Resolved incorrect version dependency warning for the RBLN Compiler
- Fixed issue with handling large input images in Llava-Next model
- Corrected behavior of
rbln_model_input_names()in BERT models
- Features:
- Updated decoder-only models to call multi-batch attention kernel to simplify model and clear attention kernel writing
- Others:
- Algined the version number of the
Optimum RBLNwith the version number of theRBLN Compilerto eliminate any confusion for users - Updated to support the latest
transformers(v4.48.3) anddiffusers(v0.31.0)
- Algined the version number of the
- Added functions:
-
vLLM RBLN:- Updated to sync with vllm v0.7.1
- Algined the version number of the
vLLM RBLNwith the version number of theRBLN Compilerto eliminate any confusion for users
-
RBLN Model Zoo:- Added new models:
- HuggingFace
- DeepSeek-R1-Distill-Llama-8B
- DeepSeek-R1-Distill-Llama-70B
- DeepSeek-R1-Distill-Qwen-1.5B
- DeepSeek-R1-Distill-Qwen-7B
- DeepSeek-R1-Distill-Qwen-14B
- DeepSeek-R1-Distill-Qwen-32B
- Kandinsky v2.2
- Inpainting
- Ko-Reranker
- KR-SBERT
- PyTorch
- LaBSE
- HuggingFace
- Added supplementary guides for the
TorchServetutorials
- Added new models:
-
RBLNServe:Deprecation Notice: Please be advised thatRBLNServewill be deprecated after March 2025. We recommend transitioning to alternative solutions (Triton server, vLLM, or torchserve) before this date to ensure continued support and functionality. After the deprecation,RBLNServewill be removed in future versions. We encourage all users to plan accordingly and migrate to the recommended alternatives
-
Other Changes:- Added new documentatin pages
- Documentation refactoring:
- The following pages were revised to provide clearer explanations and better guidance for the RBLN SDK users:
2025.02.04: Breaking Changes¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2025.02.04.0 | v1.2.92 | v0.7.1 | v0.2.0 | v0.2.0 | v0.5.6 | v0.5.0 |
Note
BREAKING CHANGES: Please update the RBLN Compiler to the latest version (v0.7.1 or higher) for compatibility with the updated RBLN Driver.
-
Install Command
-
RBLN Driver:- Enabled Low Power Management (LPM) feature
- Added support for RSD with long sequence LLM models (>32K)
- Supported driver package installation on RHEL and AlmaLinux
- Improved hDMA performance
-
RBLN Compiler:- Features
- Updated
RBLN Compilerfor compatibility with theRBLN Driver - Added support for Flash Attention
- Added support for the
depth_to_spaceoperation
- Updated
- Optimization
- Improved memory tiling algorithms
- Enhanced collective communication processing for RSD models
- Increased the write speed of
*.rblnfiles
- API
- Added a
modeoption to theoptionsargument in torch.compile()
- Added a
- Functionality
- Initial release of the
RBLN Profiler- The
RBLN Compilerv0.7.1 includes the performance profiler, which allows users to view the time spent on each step of the model inference process. For more details, refer to theRBLN Profiler.
- The
- Initial release of the
- Features
-
Optimum RBLN- Public release of
Optimum RBLN: GitHub Repository - Added
rbln_attn_implandrbln_kvcache_partition_lenarguments to decoder-only transformer model APIs to support Flash Attention:
- Public release of
-
vLLM RBLN- Updated to support Flash Attention
-
RBLN Model Zoo- Added new models
- HuggingFace
- Llama3.1-8b
- Llama3.1-70b
- Llama3.2-3b
- Llama3.3-70b
- KONI-Llama3.1-8b
- PyTorch
- YOLOv10-N/S/M/B/L/X
- HuggingFace
- Added new models
-
RBLNServe:- Pinned
rebel-compilerversion to<0.8, >=0.7
- Pinned
-
Other Changes:- Documentation refactoring:
- Added a new section for
Software > RBLN Profiler - Consolidated model serving content into a single section under
Software > Model Serving, which now includes:
- Added a new section for
- Documentation refactoring:
2024.12.27¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2024.12.27.0 | v1.1.67 | v0.6.2 | v0.1.15 | v0.1.3 | v0.5.5 | v0.4.0 |
-
Install Command
-
RBLN Compiler:- Added
get_total_device_alloc()methods toRBLNCompiledModelclass, enabling efficient retrieval of the total device memory allocation (in bytes) used by the compiled graph across all NPUs - Fixed bug for
splitoperation handling logic - Added exception handling for previously uncovered edge cases in the error handling logic
- Added
-
Optimum RBLN:- Added functions:
- RBLNStableDiffusionInpaintPipeline()
- RBLNStableDiffusionXLInpaintPipeline()
- RBLNStableDiffusionXLControlNetPipeline()
- RBLNStableDiffusionXLControlNetImg2ImgPipeline()
- RBLNStableDiffusion3Pipeline()
- RBLNStableDiffusion3Img2ImgPipeline()
- RBLNStableDiffusion3InpaintPipeline()
- Removed the dependency on optimum
- This change eliminated the automatic installation of GPU-related dependencies, resulting in a significantly faster installation process
- Added functions:
-
vLLM RBLN:- Updated to sync with vllm v0.6.5
- Updated to support EXAONE v3.5 models
-
RBLN Model Zoo:- Added new models
- HuggingFace
- EXAONE-3.5-2.4b
- EXAONE-3.5-7.8b
- Stable Diffusion v3
- Text to image
- Image to image
- Inpainting
- Stable Diffusion
- Inpainting
- Stable Diffusion XL
- Inpainting
- Text to image + controlNet
- Image to image + controlNet
- PyTorch
- YOLOv5-Face
- HuggingFace
- Improved the formatting of all model code for better readability and maintainability
- Added new examples demonstrating how to use vLLM's native APIs with a wider range of model architectures:
- Decoder-only (Llama3)
- Encoder-decoder (BART)
- Multi-modal (Llava-next)
- Added supplementary guides for the model serving:
- Tutorial > Advanced > LLM Serving > LLM Serving with Continous Batching: RBLN Model Zoo Link
- Software > Model Serving > Nvidia Triton Infernece Server: RBLN Model Zoo Link
- Added new models
2024.11.27¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2024.11.27.0 | v1.1.67 | v0.6.1 | v0.1.13 | v0.1.2 | v0.5.4 | v0.4.0 |
Note
Deprecation Notice: Python 3.8 Support
As part of our commitment to maintaining compatibility with supported and secure versions of Python, we are officailly deprecating support for Python 3.8. This version will no longer be included in future releases, and users are encouraged to upgrade to a more recent Python version to ensure continued support and compatibility.
New Additions: Python 3.11 and Python 3.12
We are pleased to announce that Python 3.11 and Python 3.12 are now fully supported and included in our release package.
-
Install Command
-
RBLN Compiler:- Improved operation efficiency:
- New fusion logic for
masked softmax - Accelerated
takeoperation for tensor indexing
- New fusion logic for
- Enhanced communication logic for RSD models
- Refactored
.rblnfile format
- Improved operation efficiency:
-
Optimum RBLN:- Added functions:
- RBLNT5EncoderModel()
- Refactored
RBLNStableDiffusion pipelinesto enhance functionality and flexibility:- LoRA support:
- The pipeline included support for Low-Rank Adaptation (LoRA).
rbln_configinput argument:- A new
rbln_configargument has been introduced. This configuration is designed specifically for RBLN compilation and incldues:- Global settings: Parameters such as
npu,device, andcreate_rutniems - Image generation settings: Prameters such as
batch_size,img_height,img_width, andguidacne_scale
- Global settings: Parameters such as
- A new
- For more detailed information, please refer to the Model API documentation and the RBLN Model Zoo example
- LoRA support:
- Added a new
from_model()method to theRBLN<ModelName>ForCausalLMclasses. This method enables LoRA support by accepting a HuggingFacePreTrainedModelas input, allowing the base model and LoRA adapter to be merged using themerge_and_unload()approach. For more details, please refer to the RBLN Model Zoo example - Enabled to support various RoPE (Rotary Position Embedding) methods
- default
- linear
- dynamic
- yarn
- longrope
- llama3
- Added functions:
-
vLLM RBLN:- Updated to sync with vllm v0.6.4
- The
--compiled_model_dirconfiguration has been deprecated and will be removed in future release. Users are encourged to use--modelargument instead. Please refer LLM serving with Continous Batching tutorial for actual usecase
-
RBLN Model Zoo:- Added new models
- HuggingFace
- Qwen2.5-7b/14b
- Llama3-8b + LoRA
- StableDiffusion + LoRA
- StableDiffusionXL + LoRA
- HuggingFace
- Dependency Updates
- PyTorch: updated to version 2.5.1
- TensorFlow: updated to version 2.18.0
- Model deprecation
- The 3DDFA model has been deprecated due to maintenance discontinuation
- Added new models
-
RBLNServe:- Pinned rebel-compiler version to <0.7, >=0.6
2024.10.30¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2024.10.30.0 | v1.1.67 | v0.5.12 | v0.1.12 | v0.1.0 | v0.5.3 | v0.3.0 |
-
Install Command
-
RBLN Compiler:- Updated to support
cosine_similarityoperation - Enabled a runtime initialization with
RBLNCompiledModeland deprecated thepathargument - Added double buffering on/off option on
AsyncRuntimecreation with theparallelarugment. - Added
example_infoargument tocompile_from_torch()to support compilation withoutInputInfocreation - Added
deviceargument totorch.compileto specify the NPU device ID for execution
- Updated to support
Optimum RBLN:- Added functions:
- RBLNQwen2ForCausalLM()
- RBLNExaoneForCausalLM()
- RBLNPhiForCausalLM()
- RBLNViTImageClassification()
- Updated to support latest transformers (v4.45.2)
- Added functions:
vLLM RBLN:- Updated to support Qwen2, EXAONE, and Phi-2 architectures
RBLN Model Zoo:- Added new models
- HuggingFace
- Qwen2-7b
- EXAONE-3.0-7.8b
- Salamandra-7b
- Phi-2
- ViT-large
- Whisper-large-v3-turbo
- PyTorch Dynamo
- SAM2.1_hiera_large
- HuggingFace
- Separated the framework-specific requirements.txt files into individual requirements.txt files for each model
- Added new models
2024.09.27¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2024.09.27.0 | v1.1.67 | v0.5.10 | v0.1.11 | v0.0.7 | v0.5.2 | v0.3.0 |
-
Install Command
-
RBLN Driver:- Added runtime power management with dynamic PCIe link speed change and PCIe ASPM (Active State Power Management)
- Improved P2P throughput
- Enhanced stability for Rebellions Scalable Design (RSD)
RBLN Compiler:- Improved internal memory management algorithm
- Updated the runtime description to show NPU version
- Refactored .rbln file format
Optimum RBLN:- Added functions:
- RBLNBertModel()
- RBLNBartModel()
- RBLNLlavaNextForConditionalGeneration()
- Updated
RBLNWhisperForConditionalGenerationto support generating token timestamps and long-form transcription
- Added functions:
vLLM RBLN:- Updated to support Llava-Next, BART, and T5 models
RBLN Model Zoo:- Added new models
- HuggingFace
- Llava-Next
- E5-Base-4k
- KoBART
- BGE-Reranker-Base/Large
- PyTorch
- MotionBERT Action Recognition
- HuggingFace
- Updated Whisper models to support generating token timestamps and long-form transcription
- Added new models
Others- Split
LLM Servingtutorial into with Triton Inference Server and with Continuous Batching - Added vLLM API standalone example in LLM serving tutorial
- Split
2024.08.30¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2024.08.30.0 | v1.0.1 | v0.5.9 | v0.1.9 | - | v0.5.1 | v0.3.0 |
| 2024.08.30.1 | v1.0.5 | v0.5.9 | v0.1.9 | v0.0.6 | v0.5.1 | v0.3.0 |
-
Install Command
-
RBLN Compiler:- Added
model_description()method inRuntimeclass - Updated to support
whereandeinsumoperations - Fixed bug for
strided_sliceoperation
- Added
Optimum RBLN:- Added functions:
- RBLNGemmaForCausalLM()
- RBLNMistralForCausalLM()
- RBLNDistilBertForQuestionAnswering()
- Added functions:
vLLM RBLN:- Updated to support Gemma and Mistral architectures
RBLN Model Zoo:- Added new models
- HuggingFace
- Gemma-2B
- Gemma-7B
- Mistral-7B
- DistilBERT
- PyTorch
- MotionBERT
- PyTorch Dynamo
- YOLOv3
- YOLOv4
- YOLOv5
- YOLOv6
- YOLOvX
- HuggingFace
- Added new models
2024.08.16¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2024.08.16.0 | v1.0.1 | v0.5.8 | v0.1.8 | - | v0.5.0 | v0.3.0 |
| 2024.08.16.1 | v1.0.5 | v0.5.8 | v0.1.8 | v0.0.4 | v0.5.0 | v0.3.0 |
-
Install Command
-
RBLN Compiler:- Improved visualization of the compilation progress bar
- Optimized performance for long sequence LLM models
- Reduced DRAM memory consumption for RSD models
- Fixed bug for PReLU handling logic
- Initial release of C/C++ runtime libraries:
- Refer Software > API > Language Binding > C/C++ for installation, API docs, and tutorials.
Optimum RBLN:- Added functions:
- RBLNRobertaForMaskedLM()
- RBLNRobertaForSequenceClassification()
- RBLNXLMRobertaModel()
- RBLNXLMRobertaForSequenceClassification()
- Added functions:
vLLM RBLN:- Updated to support GPT2 and Mi:dm architectures
RBLN Model Zoo:- Initial release to support
torch.compile()in PyTorch2.0:- Visit Tutorial > Basic > PyTorch (Vision), Tutorial > Basic > PyTorch (NLP), and Software > API > Python API pages for more information
- Examples can be found in RBLN Model Zoo repository
- Added new models (HuggingFace)
- Mi:dm-7b
- BGE-M3
- BGE-Reranker-v2-M3
- SecureBERT
- Roberta
- Initial release to support
2024.07.25¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2024.07.25.0 | v1.0.1 | v0.5.7 | v0.1.7 | - | v0.4.1 | v0.3.0 |
| 2024.07.25.1 | v1.0.5 | v0.5.7 | v0.1.7 | v0.0.3 | v0.4.1 | v0.3.0 |
-
Install Command
-
RBLN Compiler:- Optimized RSD performance for long sequence LLMs
Optimum RBLN:- Added warning messge for dependency version compatibilities
- Added RBLNDPTForDepthEstimation() functiuon
- Fixed bug for memory leak in GPT models
RBLN Model Zoo:- Added a new model (HuggingFace)
- DPT-large
- Added a new model (HuggingFace)
Others- Updated LLM Serving tutorial page
- Revised the Serving with Triton Inference Server and Continuous Batching Support with vllm-rbln sections
- Added OpenAI Compatible API Server section
- Updated LLM Serving tutorial page
2024.07.10¶
| SDK Version | Driver | Compiler | Optimum RBLN | vLLM RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|---|
| 2024.07.10.0 | v1.0.1 | v0.5.2 | v0.1.4 | - | v0.4.0 | v0.3.0 |
| 2024.07.10.1 | v1.0.5 | v0.5.2 | v0.1.4 | v0.0.3 | v0.4.0 | v0.3.0 |
RBLN Driver:- Enhanced stability for Rebellions Scalable Design (RSD)
RBLN Compiler:- Updated to support continuous batching
Optimum RBLN:- Updated
LlamaForCausalLM()class to support continuous batching
- Updated
vLLM RBLN- Initial release to support continuous batching
- Updated the LLM Serving page to include information on continuous batching
RBLN Model Zoo:- Public release of the
RBLN Model Zoo: - Added a new model (PyTorch)
- ConvTasNet
- Miscellaneous:
- Removed
pipeline()from BERT mlminference.py - Removed
pipeline()from BERT qainference.py - Added
trust_remote_code=Trueto theload_dataset()method in AST & Wav2Vec.
- Removed
- Public release of the
2024.06.11: Breaking Changes¶
| SDK Version | Driver | Compiler | Optimum RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|
| 2024.05.23.0 | v0.10.42 | v0.3.11 | v0.1.0 | v0.3.6 | v0.1.5 |
| 2024.06.11.0 | v1.0.1 | v0.5.0 | v0.1.1 | v0.3.10 | v0.3.0 |
Note
BREAKING CHANGES: Please update the RBLN Compiler to the appropriate version as below for compatibility with the updated RBLN Driver. You can check your RBLN Driver version with the rbln-stat -j | grep KMD_version command.
0.10.42:pip install -i https://pypi.rbln.ai/simple rebel-compiler==0.3.111.0.1:pip install -i https://pypi.rbln.ai/simple rebel-compiler==0.5.0
RBLN Driver:- Stable release for Rebellions Scalable Design (RSD)
RBLN Compiler:- Updated
RBLN Compilerto be compatible withRBLN Driver - Added utility APIs:
npu_is_available()get_npu_name()
- Updated
Optimum RBLN:- Updated model APIs
- Please refer to
RBLN Model Zoobelow
- Please refer to
- Updated model APIs
RBLN Model Zoo:- Added new models (HuggingFace)
- With Rebellions Scalable Design (RSD)
- Llama3-8b
- SOLAR-10.7b
- EEVE-Korean-10.8b
- SDXL-base-1.0
- ControlNet
- With Rebellions Scalable Design (RSD)
- Added new models (HuggingFace)
RBLNServe:- Pinned rebel-compiler version to <0.6, >=0.5
2024.05.23: Breaking Changes¶
| SDK Version | Driver | Compiler | Optimum RBLN | Model Zoo | RBLNServe |
|---|---|---|---|---|---|
| 2024.05.23.0 | v0.10.42 | v0.3.11 | v0.1.0 | v0.3.6 | v0.1.5 |
| 2024.05.23.1 | v0.12.37 | v0.4.0 | v0.1.0 | v0.3.6 | v0.2.0 |
Note
BREAKING CHANGES: Please update the RBLN Compiler to the appropriate version as below for compatibility with the updated RBLN Driver. You can check your RBLN Driver version with the rbln-stat -j | grep KMD_version command.
0.10.42:pip install -i https://pypi.rbln.ai/simple rebel-compiler==0.3.110.12.37:pip install -i https://pypi.rbln.ai/simple rebel-compiler==0.4.0
RBLN Driver:- Added support for Rebellions Scalable Design (RSD)
- rbln-stat (CLI tool) update: Added new columns
NameandPowerfor NPU version and power consumption, respectively
RBLN Compiler:- Updated
RBLN Compilerto be compatible with theRBLN Driver - Updated input arguments of python user APIs
- Added new user APIs for concurrent processing
- Enabled LLM compilation & inference for Rebellions Scalable Design (RSD)
- Added a new page - Nvidia Triton Inference Server
- Updated
Optimum RBLN:- Initial release
- Add new pages - HuggingFace Model Support
RBLN Model Zoo:- Added new models (HuggingFace)
- With Rebellions Scalable Design (RSD)
- Llama2-7b
- Llama2-13b
- GPT2, GPT2-medium/large/xl
- T5-small/base/large/3B
- BART-base/large
- BERT-base/large
- Stable Diffusion v1.5
- SDXL-turbo
- Whisper-tiny/base/small/medium/large
- Wav2Vec2
- Audio Spectrogram Transformer
- With Rebellions Scalable Design (RSD)
- Added new models (HuggingFace)
RBLNServe:- Pinned rebel-compiler version to <0.5, >=0.4
2024.01.31: Breaking Changes¶
| SDK Version | Driver | Compiler | Model Zoo | RBLNServe |
|---|---|---|---|---|
| 2024.01.31.0 | v0.10.42 | v0.3.5 | v0.2.0 | v0.1.5 |
Note
BREAKING CHANGES: Please update the RBLN Compiler to the latest version (v0.3.5 or higher) for compatibility with the updated RBLN Driver.
RBLN Driver:- Refactored device internal command processing logic for stability & scalability
RBLN Compiler:- Updated
RBLN Compilerto be compatible with theRBLN Driver - Updated device memory scheduling logic
- Enhanced functionality for operation fusion logic
- Updated supported OP list for both TensorFlow and Pytorch
- Updated
RBLN Model Zoo:- Added new models (PyTorch):
- YOLOv4: v4/v4-csp-s-mish/v4-csp-x-mish
- Video ResNet: r3d_18/mc3_18/r2plus1D_18
- Video S3D: s3d
- Changed default input size:
- YOLOv3/4/5/6/7/8
- deeplabv3_resnet50/resnet101/mobilenetv3_large, fcn_resnet50/101, unet
- Restructured directories:
- PyTorch image classification examples are moved from
rbln_model_zoo/pytorch/vision/classificationtorbln_model_zoo/pytorch/vision/image_classification
- PyTorch image classification examples are moved from
- Added new models (PyTorch):
RBLNServe:- Set rebel-compiler version pinned to <0.4, >=0.3
2023.10.06¶
| SDK Version | Driver | Compiler | Model Zoo | RBLNServe |
|---|---|---|---|---|
| 2023.10.06.0 | v0.9.34 | v0.2.13 | v0.1.9 | v0.1.4 |
RBLN Compiler:- Updated version parsing module of runtime APIs
- Updated runtime input size calculation logic
- Enhanced functionality for tensor slicing operations
RBLNServe:- Updated configuration for gRPC/REST protocol
2023.09.12¶
| SDK Version | Driver | Compiler | Model Zoo | RBLNServe |
|---|---|---|---|---|
| 2023.09.12.0 | v0.9.34 | v0.2.10 | v0.1.9 | v0.1.1 |
RBLN Compiler:- Enabled
print()for therebel.Runtimemodule -print(module)will show basic information of the loaded model - Refactored compiler internal large op handling passes for scalability
- Updated error message handling logic
- Fixed bug in a type cast pass
- Enabled
RBLN Model Zoo:- Updated submodule - YOLOv3
RBLNServe:- Added
--versioncommand
- Added
2023.08.18¶
| SDK Version | Driver | Compiler | Model Zoo | RBLNServe |
|---|---|---|---|---|
| 2023.08.18.0 | v0.9.34 | v0.2.1 | v0.1.8 | v0.1.0 |
RBLN Compiler:- Fixed bug for the destruction issue in
rebel.Runtime
- Fixed bug for the destruction issue in
RBLNServe:- Initial release
- Added a new page - RBLNServe (Model Server)
2023.08.12: Breaking Changes¶
| SDK Version | Driver | Compiler | Model Zoo |
|---|---|---|---|
| 2023.08.12.0 | v0.9.34 | v0.2.0 | v0.1.8 |
Note
BREAKING CHANGES: Please update the RBLN Compiler to the latest version (v0.2.0 or higher) for compatibility with the updated RBLN Driver.
RBLN Driver:- Refactored host-device communication protocol for stability & scalability
RBLN Compiler:- Updated
RBLN Compilerto be compatible with theRBLN Driver
- Updated
Others:- Added a new page - Kubernetes Support
2023.07.31¶
| SDK Version | Driver | Compiler | Model Zoo |
|---|---|---|---|
| 2023.07.31.0 | v0.8.44 | v0.1.17 | v0.1.8 |
RBLN Compiler:- Enhanced functionality for normalization operations
- Updated compiler internal scheduling logic
- Updated error message handling logic
RBLN Model Zoo:- Updated requirements.txt to use ultralytics 8.0.145
- Applied ultralytics 8.0.145 to YOLOv8
2023.07.10¶
| SDK Version | Driver | Compiler | Model Zoo |
|---|---|---|---|
| 2023.07.10.0 | v0.8.44 | v0.1.14 | v0.1.7 |
RBLN Driver:- Enhanced stability for device reset and recovery
- rbln-stat (CLI tool) update: status categorization of the process
RBLN Compiler:- Updated input arguments for
compile_from_torchscript() - Enhanced functionality for unary and binary operations
- Optimized build time
- Updated input arguments for
RBLN Model Zoo:- Added new models (PyTorch):
- YOLOv6: v6s/v6n/v6m/v6l
- YOLOv7: v7-tiny/v7/v7x
- YOLOv8: v8s/v8n/v8m/v8l/v8x
- Added new models (TF Keras Applications)
- MobileNetV3: Small/Large
- ConvNeXt: Tiny/Small/Base/Large/XLarge
- RegNetX: 002/004/006/008/016/032/040/064/080/120/160/320
- RegNetY: 002/004/006/008/016/032/040/064/080/120/160/320
- Added new models (PyTorch):
2023.06.20¶
| SDK Version | Driver | Compiler | Model Zoo |
|---|---|---|---|
| 2023.06.20.0 | v0.7.34 | v0.1.8 | v0.1.5 |
RBLN Compiler:- Added a new compile function -
compile_from_torchscript() - Enhanced functionality for matrix multiplication and pooling operations
- Optimized device memory scheduling
- Added a new compile function -
RBLN Model Zoo:- Added new models (PyTorch):
- YOLOv3: v3-tiny/v3/v3-spp
- YOLOv5: v5s/v5n/v5m/v5l/v5x
- Added new models (PyTorch):
2023.05.26¶
| SDK Version | Driver | Compiler | Model Zoo |
|---|---|---|---|
| 2023.05.26.0 | v0.7.34 | v0.1.5 | v0.1.4 |
- Initial release