Skip to content

vllm-rbln

vllm-rbln is an extension of the vLLM library that enables users to leverage the exceptional performance of RBLN NPUs for LLM inference and serving. vllm-rbln is a modified to enable vLLM to work with optimum-rbln and it provides a seamless integration with vLLM, allowing users to easily deploy and utilize their LLMs on our high-performance hardware.

How to install

To install the vllm-rbln, you need access rights to Rebellions' private PyPI server. Please refer to the Installation Guide for more information. You can find the latest version of the packages in Release Note.

$ pip3 install -i https://pypi.rbln.ai/simple vllm-rbln

Note

Since the vllm-rbln package does not depend on the vllm package, duplicate installations may cause operational issues. If you installed the vllm package after vllm-rbln, please reinstall the vllm-rbln package to ensure proper functionality.

Tutorials

To help users get started with vllm-rbln, we have created three comprehensive tutorials demonstrating its capabilities and diverse deployment options:

vllm-rbln examples

Nvidia Triton Inference Server example