ResNet50¶
In this tutorial, we will guide you through the steps required to integrate RBLN SDK with TorchServe using a precompiled ResNet50
model. For instructions on setting up the TorchServe environment, please refer to TorchServe.
You can check out the actual commands required to compile the model and set to TorchServe on our model zoo.
Note
This tutorial assumes that you are familiar with compiling and running inference using the RBLN SDK. If you are not familiar with RBLN SDK, refer to PyTorch/TensorFlow tutorials and the API Documentation.
Prerequisites¶
Before we start, please make sure you have prepared the following prerequisites in your system:
- Ubuntu 20.04 LTS (Debian bullseye) or higher
- RBLN NPUs equipped (e.g., RBLN ATOM)
- Python (supports 3.9 - 3.12)
- RBLN SDK (driver, compiler)
- TorchServe
- Compiled ResNet50 model (
resnet50.rbln
)
Quick Start with TorchServe¶
In TorchServe, models are served as Model Archive (.mar
) units, which contain all necessary information for serving the model. The following guide explains how to create a .mar
file and use it for model serving.
Write the Model Request Handler¶
Below is a simple handler that inherits from TorchServe BaseHandler
for ResNet50
inference requests. This handler defines initialize()
, inference()
, postprocess()
, and handle()
for model serving. The initialize()
method is called when the model is loaded from the model_store
directory, and the handle()
method is invoked for TorchServe inference API's predictions request.
Write the Model Configuration¶
Create the model_config.yaml
file as shown below. This file contains the necessary information for serving the model. In this tutorial, to limit the number of workers to a single instance, set both minWorkers
and maxWorkers
to 1.
model_config.yaml | |
---|---|
Model Archiving with torch-model-archiver
¶
The model_store
directory stores .mar
files, including the ResNet50
model archive used in this tutorial, for serving.
Once the model archiving setup is complete, run the torch-model-archiver
command to create the model archive file. The model_store
folder, where the generated resnet50.mar
archive file is located, will be passed as a parameter when TorchServe starts.
The options passed to torch-model-archiver
are as follows.
--model-name
: Specifies the name of the model to be served, set asresnet50
.--version
: Defines the version of the model to be served with TorchServe.--serialized-file
: Specifies the weight file of the compiled model. Set to./resnet50.rbln
.--handler
: Specifies the handler script for the model, set as./resnet50_handler.py
.--config-file
: Specifies the yaml configuration file for the model, set as./model_config.yaml
.--export-path
: Specifies the output directory for the archived file. The previously createdmodel_store
folder is set as the destination.
After executing the command, the resnet50.mar
file is generated in the model_store
directory specified by --export-path
.
Run torchserve
¶
TorchServe can be started using the following command. For a simple test where token authentication is not required, you can use the --disable-token-auth
option.
--start
: Starts the TorchServe service.--ncs
: Disable snapshot feature.--model-store
: Specifies the directory containing model archives (.mar
) files.--models
: Specify the model to serve. Ifall
is specified, all models in themodel_store
directory are designated as serving models.--disable-token-auth
: Disables token authentication.
When TorchServe is successfully started, it operates in the background. The command to stop TorchServe is shown below:
TorchServe provides the Management API on port 8081
and the Inference API on port 8080
by default.
You can check the list of models currently being served using the following Management API.
If the operation is successful, you can verify that the resnet50
model is being served.
Inference Request with TorchServe Inference API
¶
Now, we can send an inference request using the Prediction API from the TorchServe Inference API to test the ResNet50
model served with TorchServe.
Download a sample image for the ResNet50
inference request.
Make an inference request using the TorchServe Inference API
with curl.
If the inference request is successful, the following response is returned.