ResNet50¶

In this tutorial, we will guide you through the steps required to integrate RBLN SDK with TorchServe using a precompiled ResNet50 model. For instructions on setting up the TorchServe environment, please refer to TorchServe.

You can check out the actual commands required to compile the model and set to TorchServe on our model zoo.

Note

This tutorial assumes that you are familiar with compiling and running inference using the RBLN SDK. If you are not familiar with RBLN SDK, refer to PyTorch/TensorFlow tutorials and the API Documentation.

Prerequisites¶

Before we start, please make sure you have prepared the following prerequisites in your system:

Ubuntu 20.04 LTS (Debian bullseye) or higher
RBLN NPUs equipped (e.g., RBLN ATOM)
Python (supports 3.9 - 3.12)
RBLN SDK (driver, compiler)
TorchServe
Compiled ResNet50 model (resnet50.rbln)

Quick Start with TorchServe¶

In TorchServe, models are served as Model Archive (.mar) units, which contain all necessary information for serving the model. The following guide explains how to create a .mar file and use it for model serving.

Write the Model Request Handler¶

Below is a simple handler that inherits from TorchServe BaseHandler for ResNet50 inference requests. This handler defines initialize(), inference(), postprocess(), and handle() for model serving. The initialize() method is called when the model is loaded from the model_store directory, and the handle() method is invoked for TorchServe inference API's predictions request.

resnet50_handler.py
# resnet50_handler.py

import os
import torch
from torchvision.models import ResNet50_Weights
import rebel  # RBLN Runtime
import PIL.Image as Image
import io

from ts.torch_handler.base_handler import BaseHandler


class Resnet50Handler(BaseHandler):
    def __init__(self):
        self._context = None
        self.initialized = False
        self.model = None
        self.weights = None

    def initialize(self, context):
        """
        Initialize model. This will be called during model loading time
        :param context: Initial context contains model server system properties.
        :return:
        """
        self._context = context
        #  load the model, refer 'custom handler class' above for details
        model_dir = context.system_properties.get("model_dir")
        serialized_file = context.manifest["model"].get("serializedFile")
        model_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_path):
            raise RuntimeError(
                f"[RBLN ERROR] File not found at the specified model_path({model_path})."
            )

        self.module = rebel.Runtime(model_path)
        self.weights = ResNet50_Weights.DEFAULT
        self.initialized = True

    def preprocess(self, data):
        """
        Transform raw input into model input data.
        :param batch: list of raw requests, should match batch size
        :return: list of preprocessed model input data
        """
        input_data = data[0].get("data")
        if input_data is None:
            input_data = data[0].get("body")
        assert input_data is not None, print(
            "[RBLN][ERROR] Data not found with client request."
        )
        if not isinstance(input_data, (bytes, bytearray)):
            raise ValueError("[RBLN][ERROR] Preprocessed data is not binary data.")

        try:
            image = Image.open(io.BytesIO(input_data))
        except Exception as e:
            raise ValueError(f"[RBLN][ERROR]Invalid image data: {e}")
        prep = self.weights.transforms()
        batch = prep(image).unsqueeze(0)
        preprocessed_data = batch.numpy()

        return preprocessed_data

    def inference(self, model_input):
        """
        Internal inference methods
        :param model_input: transformed model input data
        :return: list of inference output in NDArray
        """

        model_output = self.module.run(model_input)

        return model_output

    def postprocess(self, inference_output):
        """
        Return inference result.
        :param inference_output: list of inference output
        :return: list of predict results
        """
        score, class_id = torch.topk(torch.tensor(inference_output), 1, dim=1)
        category_name = self.weights.meta["categories"][class_id]
        return category_name

    def handle(self, data, context):
        """
        Invoke by TorchServe for prediction request.
        Do pre-processing of data, prediction using model and postprocessing of prediciton output
        :param data: Input data for prediction
        :param context: Initial context contains model server system properties.
        :return: prediction output
        """
        model_input = self.preprocess(data)
        model_output = self.inference(model_input)
        category_name = self.postprocess(model_output)

        print("[RBLN][INFO] Top1 category: ", category_name)

        return [{"result": category_name}]

Write the Model Configuration¶

Create the model_config.yaml file as shown below. This file contains the necessary information for serving the model. In this tutorial, to limit the number of workers to a single instance, set both minWorkers and maxWorkers to 1.

model_config.yaml
minWorkers: 1
maxWorkers: 1  # Set the number of worker to create a single model instance

Model Archiving with `torch-model-archiver`¶

The model_store directory stores .mar files, including the ResNet50 model archive used in this tutorial, for serving.

1	`$ mkdir model_store`

Once the model archiving setup is complete, run the torch-model-archiver command to create the model archive file. The model_store folder, where the generated resnet50.mar archive file is located, will be passed as a parameter when TorchServe starts.

$ torch-model-archiver \
        --model-name resnet50 \
        --version 1.0 \
        --serialized-file ./resnet50.rbln \
        --handler ./resnet50_handler.py \
        --config-file ./model_config.yaml \
        --export-path ./model_store/

The options passed to torch-model-archiver are as follows.

--model-name: Specifies the name of the model to be served, set as resnet50.
--version: Defines the version of the model to be served with TorchServe.
--serialized-file: Specifies the weight file of the compiled model. Set to ./resnet50.rbln.
--handler: Specifies the handler script for the model, set as ./resnet50_handler.py.
--config-file: Specifies the yaml configuration file for the model, set as ./model_config.yaml.
--export-path: Specifies the output directory for the archived file. The previously created model_store folder is set as the destination.

After executing the command, the resnet50.mar file is generated in the model_store directory specified by --export-path.

+--(YOUR_PATH)/
|      +--model_store/
|      |       +--resnet50.mar
|      +--resnet50.rbln
|      +--resnet50_handler.py
|      +--model_config.yaml

Run `torchserve`¶

TorchServe can be started using the following command. For a simple test where token authentication is not required, you can use the --disable-token-auth option.

$ torchserve --start --ncs --model-store ./model_store --models resnet50.mar --disable-token-auth

--start: Starts the TorchServe service.
--ncs: Disable snapshot feature.
--model-store: Specifies the directory containing model archives (.mar) files.
--models: Specify the model to serve. If all is specified, all models in the model_store directory are designated as serving models.
--disable-token-auth: Disables token authentication.

When TorchServe is successfully started, it operates in the background. The command to stop TorchServe is shown below:

1	`$ torchserve --stop`

TorchServe provides the Management API on port 8081 and the Inference API on port 8080 by default.

You can check the list of models currently being served using the following Management API.

$ curl -X GET "http://localhost:8081/models"

If the operation is successful, you can verify that the resnet50 model is being served.

{
  "models": [
    {
      "modelName": "resnet50",
      "modelUrl": "resnet50.mar"
    }
  ]
}

Inference Request with `TorchServe Inference API`¶

Now, we can send an inference request using the Prediction API from the TorchServe Inference API to test the ResNet50 model served with TorchServe.

Download a sample image for the ResNet50 inference request.

1	`$ wget https://rbln-public.s3.ap-northeast-2.amazonaws.com/images/tabby.jpg`

Make an inference request using the TorchServe Inference API with curl.

$ curl -X POST "http://127.0.0.1:8080/predictions/resnet50" -H "Content-Type: application/octet-stream" --data-binary @./tabby.jpg

If the inference request is successful, the following response is returned.

1
2
3

{
  "result": "tabby"
}