Skip to content

vllm-rbln

vllm-rbln is an extension of the vLLM library that enables users to leverage the exceptional performance of RBLN NPUs for LLM inference and serving. vllm-rbln is a modified version of vLLM that works with optimum-rbln and it provides a seamless integration with vLLM, allowing users to easily deploy and utilize their LLMs on our high-performance hardware. The following table presents the comprehensive lineup of models currently supported by vllm-rbln.

Architecture Example Model Code
RBLNLlamaForCausalLM Llama-2/3
RBLNGemmaForCausalLM Gemma
RBLNPhiForCausalLM Phi-2
RBLNGPT2LMHeadModel GPT2
RBLNMidmLMHeadModel Mi:dm
RBLNMistralForCausalLM Mistral
RBLNExaoneForCausalLM EXAONE-3/3.5
RBLNQwen2ForCausalLM Qwen2/2.5
RBLNBartForConditionalGeneration BART
RBLNT5ForConditionalGeneration T5
RBLNLlavaNextForConditionalGeneration LlaVa-Next
RBLNQwen2_5_VLForConditionalGeneration Qwen2.5-VL
RBLNIdefics3ForConditionalGeneration Idefics3
RBLNT5EncoderModel T5Encoder-based
RBLNBertModel BERT-based
RBLNRobertaForSequenceClassification RoBERTa-based
RBLNRobertaModel RoBERTa-based
RBLNXLMRobertaForSequenceClassification XLM-RoBERTa-based
RBLNXLMRobertaModel XLM-RoBERTa-based

How to install

To install the vllm-rbln, you need access rights to Rebellions' private PyPI server. Please refer to the Installation Guide for more information. You can find the latest version of the packages in Release Note.

$ pip3 install -i https://pypi.rbln.ai/simple vllm-rbln

Note

Since the vllm-rbln package does not depend on the vllm package, duplicate installations may cause operational issues. If you installed the vllm package after vllm-rbln, please reinstall the vllm-rbln package to ensure proper functionality.

Tutorials

To help users get started with vllm-rbln, we have created three comprehensive tutorials demonstrating its capabilities and diverse deployment options:

vllm-rbln examples

Nvidia Triton Inference Server example