Object Detection using C/C++ APIs¶
In this tutorial, we will learn how to deploy a PyTorch YOLOv8 model using the RBLN SDK C/C++ API
. The model is compiled using the RBLN SDK Python API
, and the resulting *.rbln
file is deployed using the RBLN SDK C/C++ API
.
This approach combines the ease of model preparation in Python with the performance benefits of C/C++ for inference. The entire code used in this tutorial can be found in RBLN Model Zoo.
This tutorial includes following steps:
- How to compile the PyTorch
YOLOv8m
model and save to local storage with Python API - How to deploy the compiled model in the C/C++ runtime-based inference environment
Prerequisites¶
Before we start, please make sure you have installed the following packages:
- Model compilation
- RBLN SDK C/C++ API
- cmake >= 3.26.0
- RBLN SDK C/C++ API
Step 1. How to compile¶
While the RBLN Python API
offers comprehensive functionality, capable of handling both compilation and inference processes within the RBLN SDK, the RBLN SDK C/C++ API
is specifically designed and optimized for inference operations only.
We will use the RBLN Python API
for model compilation and RBLN C/C++ API
for performing inference.
Prepare the model¶
First, we can import the Yolov8m
model from the ultralytics library.
Compile the model¶
Once a torch model torch.nn.Module
is prepared, we can simply compile it with the rebel.compile_from_torch()
method.
If the NPU is installed on your host machine, you can omit the npu
argument in the rebel.compile_from_torch()
function. In this case, the function will automatically detect and use the installed NPU. However, if the NPU is not installed on your host machine, you need to specify the target NPU using the npu
argument to avoid any errors.
Currently, there are two supported NPU names: RBLN-CA02
, RBLN-CA12
. If you are unsure about the name of your target NPU, you can check it by running the rbln-stat
command in the shell on the host machine where the NPU is installed.
Save the compiled model¶
To save the compiled model in your local storage, you can utilize the compiled_model.save()
method.
This function allows you to store the compiled model for deployment. Here's how you can implement this step:
Complete compilation¶
The previously explained compilation code are included in the compile.py. To compile the model and generate the rbln file, execute the compile.py with the following command:
Upon successful completion of this process, you will find yolov8m.rbln
in your local storage. This file encapsulates the compiled YOLOv8 model, ready to deploy using the RBLN SDK C/C++ API
.
Step 2. How to deploy using RBLN SDK C/C++ API¶
Now, we can deploy the model using the RBLN SDK C/C++ API
to load the compiled model, run inference, and check the output results.
Prepare CMake build script¶
The example application uses the OpenCV library for image pre/post-processing, and the argparse library to parse user parameters from the command-line interface (CLI).
The following CMake script describes the dependencies on external packages and how to link them with our example application code.
Prepare the input¶
We need to prepare the preprocessed image as an input data required for the pre-trained Yolov8m
model. Here, we will perform preprocessing on the input image using various vision APIs provided by OpenCV
.
Run inference¶
The RBLN SDK provides both synchronous and asynchronous inference methods. For descriptions of the simplified APIs, please refer to the information below.
The RBLN API rbln_create_model
is used to load the compiled model by passing the path of the saved model as an input argument.
We can use rbln_create_runtime
to create a synchronous runtime from RBLNModel
, module name, and device ID.
For asynchronous operation, you can create an asynchronous runtime by passing the same arguments used for the synchronous runtime to rbln_create_async_runtime
.
To assign the input image at runtime, we use rbln_set_input
. This API takes as arguments the RBLNRuntime
, the index of the input buffer, and the address of the preprocessed buffer.
This API is applicable only for synchronous operations.
After all inputs have been updated, we can perform synchronized inference by calling rbln_run
with RBLNRuntime
as an argument.
For asynchronous operations, asynchronous inference can be performed by passing input and output buffers to rbln_async_run
.
Finally, we can retrieve the output buffer containing the inference results using rbln_get_output
. This API takes RBLNRuntime
and the output index as arguments.
For asynchronous operations, since input and output buffers were passed when calling rbln_run
, you can directly reference these output buffers.
For the required API usage in each inference mode, please refer to the following two examples:
-
Synchronous Execution
-
Asynchronous Execution
Post Processing¶
The output data
is a float32 data array with a shape of (1, 84, 8400), where each element represents the coordinates of detected objects, class id, and confidence score.
The output is processed to separate boxes, confidences, and class IDs. Then, we proceed with the non-maximum suppression (NMS) process. After NMS is completed, the detected results undergo post-processing for box rounding.
The below sample code proceeds in the following order listed:
- Convert the model output data to OpenCV
Mat
format. - Iterate through each detected object:
- Calculate bounding box coordinates.
- Find class confidence scores and select the class with the highest score.
- Store this information in array.
- Apply NMS to remove duplicate detections.
- For each detection remaining after NMS:
- Draw bounding boxes on the original image.
- Add class name and confidence score as text above the box from matched correspnding category
- Save the result image to local storage.
Release resources¶
How to build using CMake¶
The above code snippets are included in the RBLN Model Zoo C++ examples. To compile the code and create the executable binary, simply use these commands :
${SAMPLE_PATH}
describes the path of the example application. (e.g., rbln-model-zoo/cpp/object_detection)
Note
As previously mentioned, our example application uses OpenCV APIs for image processing tasks. For this purpose, the CMake build system will fetch and install OpenCV directly from its source. Please note that this installation process may require over 5 minutes to complete, depending on your system's specifications and internet connection speed.
How to run Executable file¶
If you completed all steps, you can find executable file under cmake
directory, named as object_detection
and object_detectionn_async
for synchronous and asynchronous inference, respectively.
- Synchronous execution
- ASynchronous execution
The output will be as follows :