Image Classification using C/C++ APIs¶
This tutorial introduces how to deploy a PyTorch ResNet50 model using the RBLN SDK C/C++ API
. The model is compiled using the RBLN SDK Python API
, and the resulting *.rbln
file is deployed using the RBLN SDK C/C++ API
.
This approach combines the ease of model preparation in Python with the performance benefits of C/C++ for inference. The entire code used in this tutorial can be found in RBLN Model Zoo.
The tutorial is divided into two parts:
- How to compile the PyTorch
ResNet50
and save the compiled model with Python API - How to deploy the compiled model in the C/C++ runtime-based inference environment
Prerequisites¶
Before we start, please make sure you have installed the following packages in your system:
- Model compilation
- RBLN SDK C/C++ API
- cmake >= 3.26.0
- RBLN SDK C/C++ API
Step 1. How to compile¶
While the RBLN Python API
offers comprehensive functionality, capable of handling both compilation and inference processes within the RBLN SDK, the RBLN SDK C/C++ API
is specifically designed and optimized for inference operations only.
In this tutorial, we'll use the RBLN Python API
for model compilation and RBLN C/C++ API
for inference.
Prepare the model¶
First, we can import the ResNet50
model from the TorchVision library.
Compile the model¶
Once a torch model torch.nn.Module
is prepared, we can simply compile it with the rebel.compile_from_torch()
method.
If the NPU is installed on your host machine, you can omit the npu
argument in the rebel.compile_from_torch()
function. In this case, the function will automatically detect and use the installed NPU. However, if the NPU is not installed on your host machine, you need to specify the target NPU using the npu
argument to avoid any errors.
Currently, there are two supported NPU names: RBLN-CA02
, RBLN-CA12
. If you are unsure about the name of your target NPU, you can check it by running the rbln-stat
command in the command line on the host machine where the NPU is installed.
Save the compiled model¶
To save the compiled model in your local storage, you can utilize the compiled_model.save()
method.
This function allows you to store the compiled model for deployment. Here's how you can implement this step:
Complete compilation¶
The above compilation code snippets are included in the compile.py. To compile the model and generate the *.rbln
file, execute the compile.py with the following command:
Upon successful completion of this process, you will find resnet50.rbln
in your local storage. This file encapsulates the compiled ResNet50
model, ready for deployment using the RBLN SDK C/C++ API
.
Step 2. How to deploy using RBLN SDK C/C++ API¶
Now, we can deploy the model using the RBLN SDK C/C++ API
to load the compiled model, run inference, and check the output results.
Prepare CMake build script¶
This tutorial uses the OpenCV for image pre/post-processing, and the argparse to parse user parameters from the command-line interface (CLI). The following CMake script describes the dependencies on external packages and how to link them with our example application code.
Prepare the input¶
We need to prepare the preprocessed image as input data required for the pre-trained ResNet50
model. Here, we will perform preprocessing on the input image using various vision APIs provided by OpenCV
.
Run inference¶
The RBLN SDK C/C++ API
supports both synchronous and asynchronous inference methods. For descriptions of the simplified APIs, please refer to the information below.
The RBLN API rbln_create_model
is used to load the compiled model by passing the path of the saved model as an input argument.
We can use rbln_create_runtime
to create a synchronous runtime from RBLNModel
, module name, and device ID.
For asynchronous operation, you can create an asynchronous runtime by passing the same arguments used for the synchronous runtime to rbln_create_async_runtime
.
To assign the input image at runtime, we use rbln_set_input
. This API takes as arguments the RBLNRuntime
, the index of the input buffer, and the address of the preprocessed buffer.
This API is applicable only for synchronous operations.
After all inputs have been updated, we can perform synchronized inference by calling rbln_run
with RBLNRuntime
as an argument.
For asynchronous operations, asynchronous inference can be performed by passing input and output buffers to rbln_async_run
.
Finally, we can retrieve the output buffer containing the inference results using rbln_get_output
. This API takes RBLNRuntime
and the output index as arguments.
For asynchronous operations, since input and output buffers were passed when calling rbln_run
, you can directly reference these output buffers.
For the required API usage in each inference mode, please refer to the following two examples:
-
Synchronous Execution
-
Asynchronous Execution
Post Processing¶
The output logits
is a float32 data array with a size of (1, 1000), where each element represents the score of the corresponding category in the ImageNet dataset. We can derive the top-1 index from these logits, and use this top-1 index to retrieve the corresponding category from a pre-defined Top1 classes.
Release resources¶
How to build using CMake¶
The complete code for the above API example is included in the RBLN Model Zoo C++ examples. You can easily compile the code and generate the executable binary with the following commands:
${SAMPLE_PATH}
describes the path of the example application. (e.g., rbln-model-zoo/cpp/image_classification)
Note
As previously mentioned, our example application uses OpenCV APIs for image processing tasks. For this purpose, the CMake build system will fetch and install OpenCV directly from its source. Please note that this installation process may require over 5 minutes to complete, depending on your system's specifications and internet connection speed.
How to run Executable file¶
If you completed all steps on above, you can find executable binary under the cmake
directory, named as image_classification
and image_classification_async
for synchronous and asynchronous inference, respectively.
- Synchronous Execution
- ASynchronous Execution
The results will look like this: