YOLOv8¶
In this tutorial, we will guide you through the steps required to integrate RBLN SDK with TorchServe using a precompiled YOLOv8
model. For instructions on setting up the TorchServe environment, refer to TorchServe.
You can check out the actual commands required to compile the model and initialize Nvidia Triton Python Backend on our model zoo.
Note
This tutorial assumes that you are familiar with compiling and running inference using the RBLN SDK. If you are not familiar with RBLN SDK, refer to PyTorch/TensorFlow tutorials and the API Documentation.
Prerequisites¶
Before we start, please make sure you have prepared the following prerequisites in your system:
- Ubuntu 20.04 LTS (Debian bullseye) or higher
- RBLN NPUs equipped (e.g., RBLN ATOM)
- Python (supports 3.9 - 3.12)
- RBLN SDK (driver, compiler)
- TorchServe
- Compiled YOLOv8 model (
yolov8l.rbln
) - ultralytics (v8.0.145)
- COCO label (
coco128.yaml
)- The COCO label file can be found at
ultralytics/ultralytics/cfg/datasets/coco128.yaml
when compiling in the model-zoo - yolov8.
- The COCO label file can be found at
Quick Start with TorchServe¶
In TorchServe, models are served as Model Archive (.mar
) units, which contain all necessary information for serving the model. The following guide explains how to create a .mar
file and use it for model serving.
Write the Model Request Handler¶
Below is a simple handler that inherits from TorchServe BaseHandler for YOLOv8
inference requests. This handler defines initialize()
, inference()
, postprocess()
, and handle()
for model serving. The initialize()
method is called when the model is loaded from the model_store
directory, and the handle()
method is invoked for TorchServe Inference API's predictions request.
yolov8l_handler.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
Write the Model Configuration¶
Create a config.properties
file as follows. maximum_request_size
is configured to 100 MB for input image size in this example.
config.properties | |
---|---|
Model Archiving with torch-model-archiver
¶
The model_store
directory stores .mar
files, including the YOLOv8
model archive used in this tutorial, for serving.
Now that the setup for model archiving is complete, run the torch-model-archiver
command to create the model archive file. The model_store
folder, where the generated resnet50.mar
archive file is located, will be passed as a parameter when TorchServe starts.
The options passed to torch-model-archiver
are as follows.
--model-name
: Specifies the name of the model to be served, set asyolov8l
.--version
: Defines the version of the model to be served with TorchServe.--serialized-file
: Specifies the weight file of the compiled model. Set toyolov8l.rbln
.--handler
: Specifies the handler script for the model, set asyolov8l_handler.py
.--extra-file
: Specify the files that need to be included in the archive, set ascoco128.yaml
.--export-path
: Specifies the output directory for the archived file. The previously createdmodel_store
folder is set as the destination.
After executing the command, the yolov8l.mar
file is generated in the model_store
directory specified by --export-path
.
Run the torchserve
¶
TorchServe can be started using the following command. For a simple test where token authentication is not required, you can use the --disable-token-auth
option.
--start
: Starts the TorchServe service.--ncs
: Disable snapshot feature.--ts-config
: TorchServe configuration.--model-store
: Specifies the directory containing model archives (.mar
) files.--models
: Specify the model to serve. Ifall
is specified, all models in themodel_store
directory are designated as serving models.--disable-token-auth
: Disables token authentication.
When TorchServe is successfully started, it operates in the background. The command to stop TorchServe is shown below:
TorchServe provides the Management API on port 8081
and the Inference API on port 8080
by default.
You can check the list of models currently being served using the following Management API.
If the operation is successful, you can verify that the YOLOv8
model is being served.
Inference Request with TorchServe Inference API
¶
Now we can send an inference request using the Prediction API from the TorchServe Inference API to test the YOLOv8
model served with TorchServe.
Download a sample image for the YOLOv8
inference request.
Make an inference request using the TorchServe inference API
with curl.
If the inference request is successful, the following response is returned.