ResNet50
In this tutorial, we will guide you through the steps required to integrate RBLN SDK with TorchServe using a precompiled ResNet50
model. For instructions on setting up the TorchServe environment, please refer to TorchServe.
You can check out the actual commands required to compile the model and set to TorchServe on our model zoo.
Note
This tutorial assumes that you are familiar with compiling and running inference using the RBLN SDK. If you are not familiar with RBLN SDK, refer to PyTorch/TensorFlow tutorials and the API Documentation.
Prerequisites
Before we start, please make sure you have prepared the following prerequisites in your system:
Quick Start with TorchServe
In TorchServe, models are served as Model Archive (.mar
) units, which contain all necessary information for serving the model. The following guide explains how to create a .mar
file and use it for model serving.
Write the Model Request Handler
Below is a simple handler that inherits from TorchServe BaseHandler
for ResNet50
inference requests. This handler defines initialize()
, inference()
, postprocess()
, and handle()
for model serving. The initialize()
method is called when the model is loaded from the model_store
directory, and the handle()
method is invoked for TorchServe inference API's predictions request.
resnet50_handler.py |
---|
| # resnet50_handler.py
import os
import torch
from torchvision.models import ResNet50_Weights
import rebel # RBLN Runtime
import PIL.Image as Image
import io
from ts.torch_handler.base_handler import BaseHandler
class Resnet50Handler(BaseHandler):
def __init__(self):
self._context = None
self.initialized = False
self.model = None
self.weights = None
def initialize(self, context):
"""
Initialize model. This will be called during model loading time
:param context: Initial context contains model server system properties.
:return:
"""
self._context = context
# load the model, refer 'custom handler class' above for details
model_dir = context.system_properties.get("model_dir")
serialized_file = context.manifest["model"].get("serializedFile")
model_path = os.path.join(model_dir, serialized_file)
if not os.path.isfile(model_path):
raise RuntimeError(
f"[RBLN ERROR] File not found at the specified model_path({model_path})."
)
self.module = rebel.Runtime(model_path, tensor_type="pt")
self.weights = ResNet50_Weights.DEFAULT
self.initialized = True
def preprocess(self, data):
"""
Transform raw input into model input data.
:param batch: list of raw requests, should match batch size
:return: list of preprocessed model input data
"""
input_data = data[0].get("data")
if input_data is None:
input_data = data[0].get("body")
assert input_data is not None, print(
"[RBLN][ERROR] Data not found with client request."
)
if not isinstance(input_data, (bytes, bytearray)):
raise ValueError("[RBLN][ERROR] Preprocessed data is not binary data.")
try:
image = Image.open(io.BytesIO(input_data))
except Exception as e:
raise ValueError(f"[RBLN][ERROR]Invalid image data: {e}")
prep = self.weights.transforms()
batch = prep(image).unsqueeze(0)
preprocessed_data = batch.numpy()
return torch.from_numpy(preprocessed_data)
def inference(self, model_input):
"""
Internal inference methods
:param model_input: transformed model input data
:return: list of inference output in NDArray
"""
model_output = self.module.run(model_input)
return model_output
def postprocess(self, inference_output):
"""
Return inference result.
:param inference_output: list of inference output
:return: list of predict results
"""
score, class_id = torch.topk(inference_output, 1, dim=1)
category_name = self.weights.meta["categories"][class_id]
return category_name
def handle(self, data, context):
"""
Invoke by TorchServe for prediction request.
Do pre-processing of data, prediction using model and postprocessing of prediciton output
:param data: Input data for prediction
:param context: Initial context contains model server system properties.
:return: prediction output
"""
model_input = self.preprocess(data)
model_output = self.inference(model_input)
category_name = self.postprocess(model_output)
print("[RBLN][INFO] Top1 category: ", category_name)
return [{"result": category_name}]
|
Write the Model Configuration
Create the config.properties
file as shown below. This file contains the necessary information for serving the model. In this tutorial, to limit the number of workers to a single instance, set default_workers_per_model
to 1.
config.properties |
---|
| default_workers_per_model:1
models={\
"resnet50":{\
"1.0":{\
"marName": "resnet50.mar",\
"responseTimeout": 120\
}\
}\
}
|
Model Archiving with torch-model-archiver
The model_store
directory stores .mar
files, including the ResNet50
model archive used in this tutorial, for serving.
Once the model archiving setup is complete, run the torch-model-archiver
command to create the model archive file. The model_store
folder, where the generated resnet50.mar
archive file is located, will be passed as a parameter when TorchServe starts.
| $ torch-model-archiver \
--model-name resnet50 \
--version 1.0 \
--serialized-file ./resnet50.rbln \
--handler ./resnet50_handler.py \
--export-path ./model_store/
|
The options passed to torch-model-archiver
are as follows.
--model-name
: Specifies the name of the model to be served, set as resnet50
.
--version
: Defines the version of the model to be served with TorchServe.
--serialized-file
: Specifies the weight file. Set to ./resnet50.rbln
.
--handler
: Specifies the handler script for the model, set as ./resnet50_handler.py
.
--export-path
: Specifies the output directory for the archived file. The previously created model_store
folder is set as the destination.
After executing the command, the resnet50.mar
file is generated in the model_store
directory specified by --export-path
.
| +--(YOUR_PATH)/
| +--model_store/
| | +--resnet50.mar
| +--resnet50.rbln
| +--resnet50_handler.py
| +--config.properties
|
Run torchserve
TorchServe can be started using the following command. For a simple test where token authentication is not required, you can use the --disable-token-auth
option.
| $ torchserve --start --ncs \
--ts-config ./config.properties \
--model-store ./model_store \
--models resnet50=resnet50.mar \
--disable-token-auth
|
--start
: Starts the TorchServe service.
--ncs
: Disable snapshot feature.
--ts-config
: Specifies the settings for torchserve
. Set toconfig.properties
.
--model-store
: Specifies the directory containing model archives (.mar
) files.
--models
: Specify the model to serve. If all
is specified, all models in the model_store
directory are designated as serving models.
--disable-token-auth
: Disables token authentication.
When TorchServe is successfully started, it operates in the background. The command to stop TorchServe is shown below:
TorchServe provides the Management API on port 8081
and the Inference API on port 8080
by default.
You can check the list of models currently being served using the following Management API.
| $ curl -X GET "http://localhost:8081/models"
|
If the operation is successful, you can verify that the resnet50
model is being served.
| {
"models": [
{
"modelName": "resnet50",
"modelUrl": "resnet50.mar"
}
]
}
|
Inference Request with TorchServe Inference API
Now, we can send an inference request using the Prediction API from the TorchServe Inference API to test the ResNet50
model served with TorchServe.
Download a sample image for the ResNet50
inference request.
| $ wget https://rbln-public.s3.ap-northeast-2.amazonaws.com/images/tabby.jpg
|
Make an inference request using the TorchServe Inference API
with curl.
| $ curl -X POST "http://127.0.0.1:8080/predictions/resnet50" -H "Content-Type: application/octet-stream" --data-binary @./tabby.jpg
|
If the inference request is successful, the following response is returned.
Advanced Features
Batch Inference
in TorchServe
TorchServe supports Batch Inference
, a method of grouping multiple inference requests together and processing them all at once.
Batch Inference
Configuration
To use Batch Inference
in TorchServe, the model configuration must include the following two required settings.
batchSize
: The maximum batch size
that the model can handle.
maxBatchDelay
: The maximum wait time
(in milliseconds) that TorchServe will hold requests to reach the defined batchSize
. If the number of received requests does not reach the maximum batch size
within the specified delay, all currently received requests will be sent to the handler for processing.
In the config.properties
file, specify the batch settings using batchSize
and maxBatchDelay
as shown below.
config_b4.properties |
---|
| default_workers_per_model=1
models={\
"resnet50":{\
"1.0":{\
"marName": "resnet50.mar",\
"batchSize": 4,\
"maxBatchDelay": 100,\
"responseTimeout": 120\
}\
}\
}
|
Model Compilation
Bucketing
is the process of compiling a model multiple times with different target input shapes to create optimized bucketed models
. The RBLN Compiler supports bucketing
by compiling models for various input shapes, enhancing Batch Inference
and improving memory efficiency.
Below is an example code snippet demonstrating how to define a bucketed model that supports batch sizes ranging from 1 to 4
:
| size = 224 # Width and height of image
batches = [1, 2, 3, 4] # Supported batch sizes
input_infos = []
# Create input information for each batch size
for i, batch in enumerate(batches):
input_info = [("x", [batch, 3, size, size], "float32")]
input_infos.append(input_info)
# Compile the model with the pre-defined input information
compiled_model = rebel.compile_from_torch(model, input_info=input_infos)
# Compiled model save
compiled_model.save("resnet50.rbln")
|
When saving the compiled model, the file name must match the --serialized-file
parameter specified in torch-model-archiver
to be correctly loaded by the Model Handler
.
Model Handler
The model handler creates a runtime for a specific batch size and uses it to perform inference operations based on the provided input data.
resnet50_batch_handler.py |
---|
| # resnet50_batch_handler.py
import io
import os
import numpy as np
import PIL.Image as Image
import rebel # RBLN Runtime
import torch
from torchvision.models import ResNet50_Weights
from ts.torch_handler.base_handler import BaseHandler
class Resnet50Handler(BaseHandler):
def __init__(self):
self._context = None
self.initialized = False
self.model = None
self.weights = None
self.prep = None
self.batch_size = None
self.max_batch_size = None
def initialize(self, context):
"""
Initialize model. This will be called during model loading time
:param context: Initial context contains model server system properties.
:return:
"""
self._context = context
model_dir = context.system_properties.get("model_dir")
serialized_file = context.manifest["model"].get("serializedFile")
self.max_batch_size = context.system_properties["batch_size"]
model_path = os.path.join(model_dir, serialized_file)
if not os.path.isfile(model_path):
raise RuntimeError(
f"[RBLN ERROR] File not found at the specified model_path({model_path})."
)
self.modules = []
compiled_model = rebel.RBLNCompiledModel(model_path)
for i in range(self.max_batch_size):
self.modules.append(compiled_model.create_runtime(input_info_index=i, tensor_type="pt"))
self.weights = ResNet50_Weights.DEFAULT
self.prep = self.weights.transforms()
self.initialized = True
def preprocess(self, data):
"""
Transform raw input into model input data.
:param batch: list of raw requests, should match batch size
:return: list of preprocessed model input data
"""
# Take the input data and make it inference ready
self.batch_size = num_requests = len(data)
assert self.batch_size <= self.max_batch_size, print(
f"[RBLN][ERROR] Inputed batched number({self.batch_size})"
f" is over the batchSize({self.max_batch_size}) in configuration."
)
images = []
for i in range(num_requests):
input_data = data[i].get("data")
if input_data is None:
input_data = data[i].get("body")
assert input_data is not None, print(
"[RBLN][ERROR] Data not found with client request."
)
if not isinstance(input_data, (bytes, bytearray)):
raise ValueError("[RBLN][ERROR] Preprocessed data is not binary data.")
try:
image = Image.open(io.BytesIO(input_data))
except Exception as e:
raise ValueError(f"[RBLN][ERROR]Invalid image data: {e}")
batch = self.prep(image).unsqueeze(0)
images.append(batch.numpy())
preprocessed_data = np.concatenate(images, axis=0).copy()
return torch.from_numpy(preprocessed_data)
def inference(self, model_input):
"""
Internal inference methods
:param model_input: transformed model input data
:return: list of inference output in NDArray
"""
model_output = self.modules[self.batch_size - 1].run(model_input)
return model_output
def postprocess(self, inference_output):
"""
Return inference result.
:param inference_output: list of inference output
:return: list of predict results
"""
category_names = []
chunky_batched_result = np.array_split(
inference_output, self.batch_size, axis=0
)
for result in chunky_batched_result:
score, class_id = torch.topk(result, 1, dim=1)
category_names.append(self.weights.meta["categories"][class_id])
return category_names
def handle(self, data, context):
"""
Invoke by TorchServe for prediction request.
Do pre-processing of data, prediction using model and postprocessing of prediciton output
:param data: Input data for prediction
:param context: Initial context contains model server system properties.
:return: prediction output
"""
model_input = self.preprocess(data)
model_output = self.inference(model_input)
category_names = self.postprocess(model_output)
results = []
for idx, category_name in enumerate(category_names):
print("[RBLN][INFO][", idx, "] Top1 category: ", category_name)
results.append(f"result[{idx}] : {category_name}")
return results
|
Model Serving
Using the previously created Configuration
, Model
, and Model handler
, start model serving by referring to the steps in “Model Archiving with torch-model-archiver
” and “'Run torchserve'”.
You can verify whether the configuration has been applied correctly by using the following Management API
command:
| $ curl -X GET "http://localhost:8081/models/resnet50"
|
Check whether batchSize
and maxBatchDelay
are set to the specified values in the response.
| [
{
"modelName": "resnet50",
"modelVersion": "1.0",
"modelUrl": "resnet50.mar",
"runtime": "python",
"minWorkers": 1,
"maxWorkers": 1,
"batchSize": 4,
"maxBatchDelay": 100,
:
:
"workers": [
{
:
:
}
],
:
:
}
]
|