vllm-rbln
vllm-rbln
is an extension of the vLLM library that enables users to leverage the exceptional performance of RBLN NPUs for LLM inference and serving. vllm-rbln
is a modified to enable vLLM to work with optimum-rbln
and it provides a seamless integration with vLLM, allowing users to easily deploy and utilize their LLMs on our high-performance hardware.
How to install¶
To install the vllm-rbln
, you need access rights to Rebellions' private PyPI server. Please refer to the Installation Guide for more information. You can find the latest version of the packages in Release Note.
Note
Since the vllm-rbln
package does not depend on the vllm
package, duplicate installations may cause operational issues. If you installed the vllm
package after vllm-rbln
, please reinstall the vllm-rbln
package to ensure proper functionality.
Tutorials¶
To help users get started with vllm-rbln
, we have created three comprehensive tutorials demonstrating its capabilities and diverse deployment options:
vllm-rbln
examples¶
- vLLM Native API provides a sample code on how to use the vLLM native API with
vllm-rbln
- OpenAI Compatible Server demonstrates how to create an OpenAI-compatible server leveraging
vllm-rbln
Nvidia Triton Inference Server example¶
- Triton Inference Server with vLLM backend focuses on deploying LLMs with
vllm-rbln
and Nvidia Triton Inference Server