ResNet50¶
In this tutorial, we will guide you through the steps required to integrate RBLN SDK with TorchServe using a precompiled ResNet50 model. For instructions on setting up the TorchServe environment, please refer to TorchServe.
You can check out the actual commands required to compile the model and set to TorchServe on our model zoo.
Note
This tutorial assumes that you are familiar with compiling and running inference using the RBLN SDK. If you are not familiar with RBLN SDK, refer to PyTorch/TensorFlow tutorials and the API Documentation.
Prerequisites¶
Before we start, please make sure you have prepared the following prerequisites in your system:
- Ubuntu 20.04 LTS (Debian bullseye) or higher
- RBLN NPUs equipped (e.g., RBLN ATOM)
- Python (supports 3.9 - 3.12)
- RBLN SDK (driver, compiler)
- TorchServe
- Compiled ResNet50 model (
resnet50.rbln)
Quick Start with TorchServe¶
In TorchServe, models are served as Model Archive (.mar) units, which contain all necessary information for serving the model. The following guide explains how to create a .mar file and use it for model serving.
Write the Model Request Handler¶
Below is a simple handler that inherits from TorchServe BaseHandler for ResNet50 inference requests. This handler defines initialize(), inference(), postprocess(), and handle() for model serving. The initialize() method is called when the model is loaded from the model_store directory, and the handle() method is invoked for TorchServe inference API's predictions request.
Write the Model Configuration¶
Create the model_config.yaml file as shown below. This file contains the necessary information for serving the model. In this tutorial, to limit the number of workers to a single instance, set both minWorkers and maxWorkers to 1.
| model_config.yaml | |
|---|---|
Model Archiving with torch-model-archiver¶
The model_store directory stores .mar files, including the ResNet50 model archive used in this tutorial, for serving.
Once the model archiving setup is complete, run the torch-model-archiver command to create the model archive file. The model_store folder, where the generated resnet50.mar archive file is located, will be passed as a parameter when TorchServe starts.
The options passed to torch-model-archiver are as follows.
--model-name: Specifies the name of the model to be served, set asresnet50.--version: Defines the version of the model to be served with TorchServe.--serialized-file: Specifies the weight file of the compiled model. Set to./resnet50.rbln.--handler: Specifies the handler script for the model, set as./resnet50_handler.py.--config-file: Specifies the yaml configuration file for the model, set as./model_config.yaml.--export-path: Specifies the output directory for the archived file. The previously createdmodel_storefolder is set as the destination.
After executing the command, the resnet50.mar file is generated in the model_store directory specified by --export-path.
Run torchserve¶
TorchServe can be started using the following command. For a simple test where token authentication is not required, you can use the --disable-token-auth option.
--start: Starts the TorchServe service.--ncs: Disable snapshot feature.--model-store: Specifies the directory containing model archives (.mar) files.--models: Specify the model to serve. Ifallis specified, all models in themodel_storedirectory are designated as serving models.--disable-token-auth: Disables token authentication.
When TorchServe is successfully started, it operates in the background. The command to stop TorchServe is shown below:
TorchServe provides the Management API on port 8081 and the Inference API on port 8080 by default.
You can check the list of models currently being served using the following Management API.
If the operation is successful, you can verify that the resnet50 model is being served.
Inference Request with TorchServe Inference API¶
Now, we can send an inference request using the Prediction API from the TorchServe Inference API to test the ResNet50 model served with TorchServe.
Download a sample image for the ResNet50 inference request.
Make an inference request using the TorchServe Inference API with curl.
If the inference request is successful, the following response is returned.