릴리스 노트¶

릴리스 노트의 각 변경사항은 명확한 이해를 위해 영문으로 작성되어 있습니다.

2025.02.04: Breaking Changes¶

SDK Version	Driver	Compiler	Optimum RBLN	vLLM RBLN	Model Zoo	RBLNServe
2025.02.04.0	v1.2.92	v0.7.1	v0.2.0	v0.2.0	v0.5.6	v0.5.0

Note

BREAKING CHANGES: Please update the RBLN Compiler to the latest version (v0.7.1 or higher) for compatibility with the updated RBLN Driver.

Install Command

pip3 install -i https://pypi.rbln.ai/simple rebel-compiler==0.7.1 vllm-rbln==0.2.0
pip3 install optimum-rbln==0.2.0

RBLN Driver:
- Enabled Low Power Management (LPM) feature
- Added support for RSD with long sequence LLM models (>32K)
- Supported driver package installation on RHEL and AlmaLinux
- Improved hDMA performance
RBLN Compiler:
- Features
  - Updated RBLN Compiler for compatibility with the RBLN Driver
  - Added support for Flash Attention
  - Added support for the depth_to_space operation
- Optimization
  - Improved memory tiling algorithms
  - Enhanced colective communication processing for RSD models
  - Increased the wrtie speed of *.rbln files
- API
  - Added a mode option to the options argument in torch.compile()
- Functionality
  - Initial release of the RBLN Profiler
    - The RBLN Compiler v0.7.1 includes the performance profiler, which allows users to view the time spent on each step of the model inference process. For more details, refer to the RBLN Profiler.
Optimum RBLN
- Public release of Optimum RBLN: GitHub Repository
- Added rbln_attn_impl and rbln_kvcache_partition_len arguments to decoder-only transformer model APIs to support Flash Attention:
vLLM RBLN
- Updated to support Flash Attention
RBLN Model Zoo
- Added new models
  - HuggingFace
    - Llama3.1-8b
    - Llama3.1-70b
    - Llama3.2-3b
    - Llama3.3-70b
    - KONI-Llama3.1-8b
  - PyTorch
    - YOLOv10-N/S/M/B/L/X
RBLNServe:
- Pinned rebel-compiler version to <0.8, >=0.7
Other Changes:
- Documentation refactoring:
  - Added a new section for Software > RBLN Profiler
  - Consolidated model serving content into a single section under Software > Model Serving, which now includes:

2024.12.27¶

SDK Version	Driver	Compiler	Optimum RBLN	vLLM RBLN	Model Zoo	RBLNServe
2024.12.27.0	v1.1.67	v0.6.2	v0.1.15	v0.1.3	v0.5.5	v0.4.0

Install Command

pip3 install -i https://pypi.rbln.ai/simple rebel-compiler==0.6.2 optimum-rbln==0.1.15 vllm-rbln==0.1.3

RBLN Compiler:
- Added get_total_device_alloc() methods to RBLNCompiledModel class, enabling efficient retrieval of the total device memory allocation (in bytes) used by the compiled graph across all NPUs
- Fixed bug for split operation handling logic
- Added exception handling for previously uncovered edge cases in the error handling logic
Optimum RBLN:
- Added functions:
  - RBLNStableDiffusionInpaintPipeline()
  - RBLNStableDiffusionXLInpaintPipeline()
  - RBLNStableDiffusionXLControlNetPipeline()
  - RBLNStableDiffusionXLControlNetImg2ImgPipeline()
  - RBLNStableDiffusion3Pipeline()
  - RBLNStableDiffusion3Img2ImgPipeline()
  - RBLNStableDiffusion3InpaintPipeline()
- Removed the dependency on optimum
  - This change eliminated the automatic installation of GPU-related dependencies, resulting in a significantly faster installation process
vLLM RBLN:
- Updated to sync with vllm v0.6.5
- Updated to support EXAONE v3.5 models
RBLN Model Zoo:
- Added new models
  - HuggingFace
    - EXAONE-3.5-2.4b
    - EXAONE-3.5-7.8b
    - Stable Diffusion v3
      - Text to image
      - Image to image
      - Inpainting
    - Stable Diffusion
      - Inpainting
    - Stable Diffusion XL
      - Inpainting
      - Text to image + controlNet
      - Image to image + controlNet
  - PyTorch
    - YOLOv5-Face
- Improved the formatting of all model code for better readability and maintainability
- Added new examples demonstrating how to use vLLM's native APIs with a wider range of model architectures:
  - Decoder-only (Llama3)
  - Encoder-decoder (BART)
  - Multi-modal (Llava-next)
- Added supplementary guides for the model serving:
  - Tutorial > Advanced > LLM Serving > LLM Serving with Continous Batching: RBLN Model Zoo Link
  - Software > Model Serving > Nvidia Triton Infernece Server: RBLN Model Zoo Link

2024.11.27¶

SDK Version	Driver	Compiler	Optimum RBLN	vLLM RBLN	Model Zoo	RBLNServe
2024.11.27.0	v1.1.67	v0.6.1	v0.1.13	v0.1.2	v0.5.4	v0.4.0

Note

Deprecation Notice: Python 3.8 Support

As part of our commitment to maintaining compatibility with supported and secure versions of Python, we are officailly deprecating support for Python 3.8. This version will no longer be included in future releases, and users are encouraged to upgrade to a more recent Python version to ensure continued support and compatibility.

New Additions: Python 3.11 and Python 3.12

We are pleased to announce that Python 3.11 and Python 3.12 are now fully supported and included in our release package.

Install Command

pip3 install -i https://pypi.rbln.ai/simple rebel-compiler==0.6.1 optimum-rbln==0.1.13 vllm-rbln==0.1.2

RBLN Compiler:
- Improved operation efficiency:
  - New fusion logic for masked softmax
  - Accelerated take operation for tensor indexing
- Enhanced communication logic for RSD models
- Refactored .rbln file format
Optimum RBLN:
- Added functions:
  - RBLNT5EncoderModel()
- Refacotred RBLNStableDiffusion pipelines to enhance functionality and flexibility:
  - LoRA support:
    - The pipeline included support for Low-Rank Adaptation (LoRA).
  - rbln_config input argument:
    - A new rbln_config argument has been introduced. This configuration is designed specifically for RBLN compilation and incldues:
      - Global settings: Parameters such as npu, device, and create_rutniems
      - Image generation settings: Prameters such as batch_size, img_height, img_width, and guidacne_scale
  - For more detailed information, please refer to the Model API documentation and the RBLN Model Zoo example
- Added a new from_model() method to the RBLN<ModelName>ForCausalLM classes. This method enables LoRA support by accepting a HuggingFace PreTrainedModel as input, allowing the base model and LoRA adapter to be merged using the merge_and_unload() approach. For more details, please refer to the RBLN Model Zoo example
- Enabled to support various RoPE (Rotary Position Embedding) methods
  - default
  - linear
  - dynamic
  - yarn
  - longrope
  - llama3
vLLM RBLN:
- Updated to sync with vllm v0.6.4
- The --compiled_model_dir configuration has been deprecated and will be removed in future release. Users are encourged to use --model argument instead. Please refer LLM serving with Continous Batching tutorial for actual usecase
RBLN Model Zoo:
- Added new models
  - HuggingFace
    - Qwen2.5-7b/14b
    - Llama3-8b + Lora
    - StableDiffusion + LoRA
    - StableDiffusionXL + LoRA
- Dependency Updates
  - PyTorch: updated to version 2.5.1
  - TensorFlow: updated to version 2.18.0
- Model deprecation
  - The 3DDFA model has been deprecated due to maintenance discontinuation
RBLNServe:
- Pinned rebel-compiler version to <0.7, >=0.6

2024.10.30¶

SDK Version	Driver	Compiler	Optimum RBLN	vLLM RBLN	Model Zoo	RBLNServe
2024.10.30.0	v1.1.67	v0.5.12	v0.1.12	v0.1.0	v0.5.3	v0.3.0

Install Command

pip3 install -i https://pypi.rbln.ai/simple rebel-compiler==0.5.12 optimum-rbln==0.1.12 vllm-rbln==0.1.0

RBLN Compiler:
- Updated to support cosine_similarity operation
- Enabled a runtime initialization with RBLNCompiledModel and deprecated the path argument
  - Runtime
  - AsyncRuntime
- Added double buffering on/off option on AsyncRuntime creation with the parallel arugment.
- Added example_info argument to compile_from_torch() to support compilation without InputInfo creation
- Added device argument to torch.compile to specify the NPU device ID for execution
Optimum RBLN:
- Added functions:
  - RBLNQwen2ForCausalLM()
  - RBLNExaoneForCausalLM()
  - RBLNPhiForCausalLM()
  - RBLNViTImageClassification()
- Updated to support latest transformers (v4.45.2)
vLLM RBLN:
- Updated to support Qwen2, EXAONE, and Phi-2 architectures
RBLN Model Zoo:
- Added new models
  - HuggingFace
    - Qwen2-7b
    - EXAONE-3.0-7.8b
    - Salamandra-7b
    - Phi-2
    - ViT-large
    - Whisper-large-v3-turbo
  - PyTorch Dynamo
    - SAM2.1_hiera_large
- Separated the framework-specific requirements.txt files into individual requirements.txt files for each model

2024.09.27¶

SDK Version	Driver	Compiler	Optimum RBLN	vLLM RBLN	Model Zoo	RBLNServe
2024.09.27.0	v1.1.67	v0.5.10	v0.1.11	v0.0.7	v0.5.2	v0.3.0

Install Command

pip3 install -i https://pypi.rbln.ai/simple rebel-compiler==0.5.10 optimum-rbln==0.1.11 vllm-rbln==0.0.7

RBLN Driver:
- Added runtime power management with dynamic PCIe link speed change and PCIe ASPM (Active State Power Management)
- Improved P2P throughput
- Enhanced stability for Rebellions Scalable Design (RSD)
RBLN Compiler:
- Improved internal memory management algorithm
- Updated the runtime description to show NPU version
- Refactored .rbln file format
Optimum RBLN:
- Added functions:
  - RBLNBertModel()
  - RBLNBartModel()
  - RBLNLlavaNextForConditionalGeneration()
- Updated RBLNWhisperForConditionalGeneration to support generating token timestamps and long-form transcription
vLLM RBLN:
- Updated to support Llava-Next, BART, and T5 models
RBLN Model Zoo:
- Added new models
  - HuggingFace
    - Llava-Next
    - E5-Base-4k
    - KoBART
    - BGE-Reranker-Base/Large
  - PyTorch
    - MotionBERT Action Recognition
- Updated Whisper models to support generating token timestamps and long-form transcription
Others
- Split LLM Serving tutorial into with Triton Inference Server and with Continuous Batching
- Added vLLM API standalone example in LLM serving tutorial

2024.08.30¶

SDK Version	Driver	Compiler	Optimum RBLN	vLLM RBLN	Model Zoo	RBLNServe
2024.08.30.0	v1.0.1	v0.5.9	v0.1.9	-	v0.5.1	v0.3.0
2024.08.30.1	v1.0.5	v0.5.9	v0.1.9	v0.0.6	v0.5.1	v0.3.0

Install Command

pip3 install -i https://pypi.rbln.ai/simple rebel-compiler==0.5.9 optimum-rbln==0.1.9 vllm-rbln==0.0.6

RBLN Compiler:
- Added model_description() method in Runtime class
- Updated to support where and einsum operations
- Fixed bug for strided_slice operation
Optimum RBLN:
- Added functions:
  - RBLNGemmaForCausalLM()
  - RBLNMistralForCausalLM()
  - RBLNDistilBertForQuestionAnswering()
vLLM RBLN:
- Updated to support Gemma and Mistral architectures
RBLN Model Zoo:
- Added new models
  - HuggingFace
    - Gemma-2B
    - Gemma-7B
    - Mistral-7B
    - DistilBERT
  - PyTorch
    - MotionBERT
  - PyTorch Dynamo
    - YOLOv3
    - YOLOv4
    - YOLOv5
    - YOLOv6
    - YOLOvX

2024.08.16¶

SDK Version	Driver	Compiler	Optimum RBLN	vLLM RBLN	Model Zoo	RBLNServe
2024.08.16.0	v1.0.1	v0.5.8	v0.1.8	-	v0.5.0	v0.3.0
2024.08.16.1	v1.0.5	v0.5.8	v0.1.8	v0.0.4	v0.5.0	v0.3.0

Install Command

pip3 install -i https://pypi.rbln.ai/simple rebel-compiler==0.5.8 optimum-rbln==0.1.8 vllm-rbln==0.0.4

RBLN Compiler:
- Improved visualization of the compilation progress bar
- Optimized performance for long sequence LLM models
- Reduced DRAM memory consumption for RSD models
- Fixed bug for PReLU handling logic
- Initial release of C/C++ runtime libraries:
  - Refer Software > API > Language Binding > C/C++ for installation, API docs, and tutorials.
Optimum RBLN:
- Added functions:
  - RBLNRobertaForMaskedLM()
  - RBLNRobertaForSequenceClassification()
  - RBLNXLMRobertaModel()
  - RBLNXLMRobertaForSequenceClassification()
vLLM RBLN:
- Updated to support GPT2 and Mi:dm architectures
RBLN Model Zoo:
- Initial release to support torch.compile() in PyTorch2.0:
  - Visit Tutorial > Basic > PyTorch (Vision), Tutorial > Basic > PyTorch (NLP), and Software > API > Python API pages for more information
  - Examples can be found in RBLN Model Zoo repository
- Added new models (HuggingFace)
  - Mi:dm-7b
  - BGE-M3
  - BGE-Reranker-v2-M3
  - SecureBERT
  - Roberta

2024.07.25¶

SDK Version	Driver	Compiler	Optimum RBLN	vLLM RBLN	Model Zoo	RBLNServe
2024.07.25.0	v1.0.1	v0.5.7	v0.1.7	-	v0.4.1	v0.3.0
2024.07.25.1	v1.0.5	v0.5.7	v0.1.7	v0.0.3	v0.4.1	v0.3.0

Install Command

pip3 install -i https://pypi.rbln.ai/simple rebel-compiler==0.5.7 optimum-rbln==0.1.7 vllm-rbln==0.0.3

RBLN Compiler:
- Optimized RSD performance for long sequence LLMs
Optimum RBLN:
- Added warning messge for dependency version compatibilities
- Added RBLNDPTForDepthEstimation() functiuon
- Fixed bug for memory leak in GPT models
RBLN Model Zoo:
- Added a new model (HuggingFace)
  - DPT-large
Others
- Updated LLM Serving tutorial page
  - Revised the Serving with Triton Inference Server and Continuous Batching Support with vllm-rbln sections
  - Added OpenAI Compatible API Server section

2024.07.10¶

SDK Version	Driver	Compiler	Optimum RBLN	vLLM RBLN	Model Zoo	RBLNServe
2024.07.10.0	v1.0.1	v0.5.2	v0.1.4	-	v0.4.0	v0.3.0
2024.07.10.1	v1.0.5	v0.5.2	v0.1.4	v0.0.3	v0.4.0	v0.3.0

RBLN Driver:
- Enhanced stability for Rebellions Scalable Design (RSD)
RBLN Compiler:
- Updated to support continuous batching
Optimum RBLN:
- Updated LlamaForCausalLM() class to support continuous batching
vLLM RBLN
- Initial release to support continuous batching
- Updated the LLM Serving page to include information on continuous batching
RBLN Model Zoo:
- Public release of the RBLN Model Zoo:
  - https://github.com/rebellions-sw/rbln-model-zoo
- Added a new model (PyTorch)
  - ConvTasNet
- Miscellaneous:
  - Removed pipeline() from BERT mlm inference.py
  - Removed pipeline() from BERT qa inference.py
  - Added trust_remote_code=True to the load_dataset() method in AST & Wav2Vec.

2024.06.11: Breaking Changes¶

SDK Version	Driver	Compiler	Optimum RBLN	Model Zoo	RBLNServe
2024.05.23.0	v0.10.42	v0.3.11	v0.1.0	v0.3.6	v0.1.5
2024.06.11.0	v1.0.1	v0.5.0	v0.1.1	v0.3.10	v0.3.0

Note

BREAKING CHANGES: Please update the RBLN Compiler to the appropriate version as below for compatibility with the updated RBLN Driver. You can check your RBLN Driver version with the rbln-stat -j | grep KMD_version command.

0.10.42: pip install -i https://pypi.rbln.ai/simple rebel-compiler==0.3.11
1.0.1: pip install -i https://pypi.rbln.ai/simple rebel-compiler==0.5.0

RBLN Driver:
- Stable release for Rebellions Scalable Design (RSD)
RBLN Compiler:
- Updated RBLN Compiler to be compatible with RBLN Driver
- Added utility APIs:
  - npu_is_available()
  - get_npu_name()
Optimum RBLN:
- Updated model APIs
  - Please refer to RBLN Model Zoo below
RBLN Model Zoo:
- Added new models (HuggingFace)
  - With Rebellions Scalable Design (RSD)
    - Llama3-8b
    - SOLAR-10.7b
    - EEVE-Korean-10.8b
  - SDXL-base-1.0
  - ControlNet
RBLNServe:
- Pinned rebel-compiler version to <0.6, >=0.5

2024.05.23: Breaking Changes¶

SDK Version	Driver	Compiler	Optimum RBLN	Model Zoo	RBLNServe
2024.05.23.0	v0.10.42	v0.3.11	v0.1.0	v0.3.6	v0.1.5
2024.05.23.1	v0.12.37	v0.4.0	v0.1.0	v0.3.6	v0.2.0

Note

BREAKING CHANGES: Please update the RBLN Compiler to the appropriate version as below for compatibility with the updated RBLN Driver. You can check your RBLN Driver version with the rbln-stat -j | grep KMD_version command.

0.10.42: pip install -i https://pypi.rbln.ai/simple rebel-compiler==0.3.11
0.12.37: pip install -i https://pypi.rbln.ai/simple rebel-compiler==0.4.0

RBLN Driver:
- Added support for Rebellions Scalable Design (RSD)
- rbln-stat (CLI tool) update: Added new columns Name and Power for NPU version and power consumption, respectively
RBLN Compiler:
- Updated RBLN Compiler to be compatible with the RBLN Driver
- Updated input arguments of python user APIs
- Added new user APIs for concurrent processing
- Enabled LLM compilation & inference for Rebellions Scalable Design (RSD)
- Added a new page - Nvidia Triton Inference Server
Optimum RBLN:
- Initial release
- Add new pages - HuggingFace Model Support
RBLN Model Zoo:
- Added new models (HuggingFace)
  - With Rebellions Scalable Design (RSD)
    - Llama2-7b
    - Llama2-13b
  - GPT2, GPT2-medium/large/xl
  - T5-small/base/large/3B
  - BART-base/large
  - BERT-base/large
  - Stable Diffusion v1.5
  - SDXL-turbo
  - Whisper-tiny/base/small/medium/large
  - Wav2Vec2
  - Audio Spectrogram Transformer
RBLNServe:
- Pinned rebel-compiler version to <0.5, >=0.4

2024.01.31: Breaking Changes¶

SDK Version	Driver	Compiler	Model Zoo	RBLNServe
2024.01.31.0	v0.10.42	v0.3.5	v0.2.0	v0.1.5

Note

BREAKING CHANGES: Please update the RBLN Compiler to the latest version (v0.3.5 or higher) for compatibility with the updated RBLN Driver.

RBLN Driver:
- Refactored device internal command processing logic for stability & scalability
RBLN Compiler:
- Updated RBLN Compiler to be compatible with the RBLN Driver
- Updated device memory scheduling logic
- Enhanced functionality for operation fusion logic
- Updated supported OP list for both TensorFlow and Pytorch
RBLN Model Zoo:
- Added new models (PyTorch):
  - YOLOv4: v4/v4-csp-s-mish/v4-csp-x-mish
  - Video ResNet: r3d_18/mc3_18/r2plus1D_18
  - Video S3D: s3d
- Changed default input size:
  - YOLOv3/4/5/6/7/8
  - deeplabv3_resnet50/resnet101/mobilenetv3_large, fcn_resnet50/101, unet
- Restructured directories:
  - PyTorch image classification examples are moved from rbln_model_zoo/pytorch/vision/classification to rbln_model_zoo/pytorch/vision/image_classification
RBLNServe:
- Set rebel-compiler version pinned to <0.4, >=0.3

2023.10.06¶

SDK Version	Driver	Compiler	Model Zoo	RBLNServe
2023.10.06.0	v0.9.34	v0.2.13	v0.1.9	v0.1.4

RBLN Compiler:
- Updated version parsing module of runtime APIs
- Updated runtime input size calculation logic
- Enhanced functionality for tensor slicing operations
RBLNServe:
- Updated configuration for gRPC/REST protocol

2023.09.12¶

SDK Version	Driver	Compiler	Model Zoo	RBLNServe
2023.09.12.0	v0.9.34	v0.2.10	v0.1.9	v0.1.1

RBLN Compiler:
- Enabled print() for the rebel.Runtime module - print(module) will show basic information of the loaded model
- Refactored compiler internal large op handling passes for scalability
- Updated error message handling logic
- Fixed bug in a type cast pass
RBLN Model Zoo:
- Updated submodule - YOLOv3
RBLNServe:
- Added --version command

2023.08.18¶

SDK Version	Driver	Compiler	Model Zoo	RBLNServe
2023.08.18.0	v0.9.34	v0.2.1	v0.1.8	v0.1.0

RBLN Compiler:
- Fixed bug for the destruction issue in rebel.Runtime
RBLNServe:
- Initial release
- Added a new page - RBLNServe (Model Server)

2023.08.12: Breaking Changes¶

SDK Version	Driver	Compiler	Model Zoo
2023.08.12.0	v0.9.34	v0.2.0	v0.1.8

Note

BREAKING CHANGES: Please update the RBLN Compiler to the latest version (v0.2.0 or higher) for compatibility with the updated RBLN Driver.

RBLN Driver:
- Refactored host-device communication protocol for stability & scalability
RBLN Compiler:
- Updated RBLN Compiler to be compatible with the RBLN Driver
Others:
- Added a new page - Kubernetes Support

2023.07.31¶

SDK Version	Driver	Compiler	Model Zoo
2023.07.31.0	v0.8.44	v0.1.17	v0.1.8

RBLN Compiler:
- Enhanced functionality for normalization operations
- Updated compiler internal scheduling logic
- Updated error message handling logic
RBLN Model Zoo:
- Updated requirements.txt to use ultralytics 8.0.145
- Applied ultralytics 8.0.145 to YOLOv8

2023.07.10¶

SDK Version	Driver	Compiler	Model Zoo
2023.07.10.0	v0.8.44	v0.1.14	v0.1.7

RBLN Driver:
- Enhanced stability for device reset and recovery
- rbln-stat (CLI tool) update: status categorization of the process
RBLN Compiler:
- Updated input arguments for compile_from_torchscript()
- Enhanced functionality for unary and binary operations
- Optimized build time
RBLN Model Zoo:
- Added new models (PyTorch):
  - YOLOv6: v6s/v6n/v6m/v6l
  - YOLOv7: v7-tiny/v7/v7x
  - YOLOv8: v8s/v8n/v8m/v8l/v8x
- Added new models (TF Keras Applications)
  - MobileNetV3: Small/Large
  - ConvNeXt: Tiny/Small/Base/Large/XLarge
  - RegNetX: 002/004/006/008/016/032/040/064/080/120/160/320
  - RegNetY: 002/004/006/008/016/032/040/064/080/120/160/320

2023.06.20¶

SDK Version	Driver	Compiler	Model Zoo
2023.06.20.0	v0.7.34	v0.1.8	v0.1.5

RBLN Compiler:
- Added a new compile function - compile_from_torchscript()
- Enhanced functionality for matrix multiplication and pooling operations
- Optimized device memory scheduling
RBLN Model Zoo:
- Added new models (PyTorch):
  - YOLOv3: v3-tiny/v3/v3-spp
  - YOLOv5: v5s/v5n/v5m/v5l/v5x

2023.05.26¶

SDK Version	Driver	Compiler	Model Zoo
2023.05.26.0	v0.7.34	v0.1.5	v0.1.4

Initial release