Skip to content

Object Detection using C/C++ APIs

In this tutorial, we will learn how to deploy a PyTorch YOLOv8 model using the RBLN SDK C/C++ API. The model is compiled using the RBLN SDK Python API, and the resulting *.rbln file is deployed using the RBLN SDK C/C++ API.

This approach combines the ease of model preparation in Python with the performance benefits of C/C++ for inference. The entire code used in this tutorial can be found in RBLN Model Zoo.

This tutorial includes following steps:

  1. How to compile the PyTorch YOLOv8m model and save to local storage with Python API
  2. How to deploy the compiled model in the C/C++ runtime-based inference environment

Prerequisites

Before we start, please make sure you have installed the following packages:

Step 1. How to compile

While the RBLN Python API offers comprehensive functionality, capable of handling both compilation and inference processes within the RBLN SDK, the RBLN SDK C/C++ API is specifically designed and optimized for inference operations only.

We will use the RBLN Python API for model compilation and RBLN C/C++ API for performing inference.

Prepare the model

First, we can import the Yolov8m model from the ultralytics library.

1
2
3
4
5
6
7
8
from ultralytics import YOLO
import rebel
import torch

model_name = "yolov8m"
yolo = YOLO(f"{model_name}.pt")
model = yolo.model.eval()
model(torch.zeros(1, 3, 640, 640))

Compile the model

Once a torch model torch.nn.Module is prepared, we can simply compile it with the rebel.compile_from_torch() method.

# Compile the model
compiled_model = rebel.compile_from_torch(model, [("x", [1, 3, 640, 640], "float32")], npu="RBLN-CA02")

If the NPU is installed on your host machine, you can omit the npu argument in the rebel.compile_from_torch() function. In this case, the function will automatically detect and use the installed NPU. However, if the NPU is not installed on your host machine, you need to specify the target NPU using the npu argument to avoid any errors.

Currently, there are two supported NPU names: RBLN-CA02, RBLN-CA12. If you are unsure about the name of your target NPU, you can check it by running the rbln-stat command in the shell on the host machine where the NPU is installed.

Save the compiled model

To save the compiled model in your local storage, you can utilize the compiled_model.save() method. This function allows you to store the compiled model for deployment. Here's how you can implement this step:

# Save the compiled model to local storage
compiled_model.save(f"{model_name}.rbln")   # model_name = yolov8m

Complete compilation

The previously explained compilation code are included in the compile.py. To compile the model and generate the rbln file, execute the compile.py with the following command:

python compile.py --model-name=yolov8m

Upon successful completion of this process, you will find yolov8m.rbln in your local storage. This file encapsulates the compiled YOLOv8 model, ready to deploy using the RBLN SDK C/C++ API.

Step 2. How to deploy using RBLN SDK C/C++ API

Now, we can deploy the model using the RBLN SDK C/C++ API to load the compiled model, run inference, and check the output results.

Prepare CMake build script

The example application uses the OpenCV library for image pre/post-processing, and the argparse library to parse user parameters from the command-line interface (CLI).

The following CMake script describes the dependencies on external packages and how to link them with our example application code.

# Define dependencies for external Package
include(FetchContent)
include(cmake/opencv.cmake)
include(cmake/argparse.cmake)

# Define the name of executable
add_executable(object_detection main.cc)

# Update link info for package dependencies: OpenCV
find_package(OpenCV CONFIG REQUIRED)
target_link_libraries(object_detection ${OpenCV_LIBS})

# Update link info for dependencies: RBLN
find_package(rbln CONFIG REQUIRED)
target_link_libraries(object_detection rbln::rbln_runtime)

# Update including dependencies: argparse
target_include_directories(object_detection PRIVATE ${argparse_INCLUDE_DIRS})

Prepare the input

We need to prepare the preprocessed image as an input data required for the pre-trained Yolov8m model. Here, we will perform preprocessing on the input image using various vision APIs provided by OpenCV.

  // Preprocessing images
  std::string input_path = "${SAMPLE_PATH}/people4.jpg";
  cv::Mat image;
  try {
    image = cv::imread(input_path);
  } catch (const cv::Exception &err) {
    std::cerr << err.what() << std::endl;
    std::exit(1);
  }

  // Image conversion to CV matrix
  cv::Mat blob =
      cv::dnn::blobFromImage(GetSquareImage(image, 640), 1. / 255., cv::Size(),
                             cv::Scalar(), true, false, CV_32F);

Run inference

The RBLN SDK provides both synchronous and asynchronous inference methods. For descriptions of the simplified APIs, please refer to the information below.

The RBLN API rbln_create_model is used to load the compiled model by passing the path of the saved model as an input argument.

We can use rbln_create_runtime to create a synchronous runtime from RBLNModel, module name, and device ID. For asynchronous operation, you can create an asynchronous runtime by passing the same arguments used for the synchronous runtime to rbln_create_async_runtime.

To assign the input image at runtime, we use rbln_set_input. This API takes as arguments the RBLNRuntime, the index of the input buffer, and the address of the preprocessed buffer. This API is applicable only for synchronous operations.

After all inputs have been updated, we can perform synchronized inference by calling rbln_run with RBLNRuntime as an argument. For asynchronous operations, asynchronous inference can be performed by passing input and output buffers to rbln_async_run.

Finally, we can retrieve the output buffer containing the inference results using rbln_get_output. This API takes RBLNRuntime and the output index as arguments. For asynchronous operations, since input and output buffers were passed when calling rbln_run, you can directly reference these output buffers.

For the required API usage in each inference mode, please refer to the following two examples:

  • Synchronous Execution

      std::string model_path = "${SAMPLE_PATH}/yolov8m.rbln";
    
      RBLNModel *mod = rbln_create_model(model_path.c_str());
      RBLNRuntime *rt = rbln_create_runtime(mod, "default", 0);
    
      // Set input data
      rbln_set_input(rt, 0, blob.data);
    
      // Run sync inference
      rbln_run(rt);
    
      // Get output results
      void *data = rbln_get_output(rt, 0);
    

  • Asynchronous Execution

      std::string model_path = "${SAMPLE_PATH}/yolov8m.rbln";
    
      RBLNModel *mod = rbln_create_model(model_path.c_str());
      RBLNRuntime *rt = rbln_create_async_runtime(mod, "default", 0);
    
      // Alloc output buffer
      auto buf_size = rbln_get_layout_nbytes(rbln_get_output_layout(rt, 0));
      std::vector<float> logits(buf_size/sizeof(float));
    
      // Run async inference
      int rid = rbln_async_run(rt, blob.data, logits.data());
    
      // Wait inference done
      rbln_async_wait(rt, rid, 1000);
    

Post Processing

The output data is a float32 data array with a shape of (1, 84, 8400), where each element represents the coordinates of detected objects, class id, and confidence score.

The output is processed to separate boxes, confidences, and class IDs. Then, we proceed with the non-maximum suppression (NMS) process. After NMS is completed, the detected results undergo post-processing for box rounding.

The below sample code proceeds in the following order listed:

  1. Convert the model output data to OpenCV Mat format.
  2. Iterate through each detected object:
    • Calculate bounding box coordinates.
    • Find class confidence scores and select the class with the highest score.
    • Store this information in array.
  3. Apply NMS to remove duplicate detections.
  4. For each detection remaining after NMS:
    • Draw bounding boxes on the original image.
    • Add class name and confidence score as text above the box from matched correspnding category
  5. Save the result image to local storage.
  // Postprocessing for NMS
  const RBLNTensorLayout *layout = rbln_get_output_layout(rt, 0);
  cv::Mat logits{layout->ndim, layout->shape, CV_32F};
  memcpy(logits.data, data, rbln_get_layout_nbytes(layout));

  std::vector<cv::Rect> nms_boxes;
  std::vector<float> nms_confidences;
  std::vector<size_t> nms_class_ids;
  for (size_t i = 0; i < layout->shape[2]; i++) {
    auto cx = logits.at<float>(0, 0, i);
    auto cy = logits.at<float>(0, 1, i);
    auto w = logits.at<float>(0, 2, i);
    auto h = logits.at<float>(0, 3, i);
    auto x = cx - w / 2;
    auto y = cy - h / 2;
    cv::Rect rect{static_cast<int>(x), static_cast<int>(y), static_cast<int>(w),
                  static_cast<int>(h)};
    float confidence = std::numeric_limits<float>::min();
    int cls_id;
    for (size_t j = 4; j < layout->shape[1]; j++) {
      if (confidence < logits.at<float>(0, j, i)) {
        confidence = logits.at<float>(0, j, i);
        cls_id = j - 4;
      }
    }
    nms_boxes.push_back(rect);
    nms_confidences.push_back(confidence);
    nms_class_ids.push_back(cls_id);
  }
  std::vector<int> nms_indices;
  cv::dnn::NMSBoxes(nms_boxes, nms_confidences, 0.25f, 0.45f, nms_indices);

  // Draw output image
  cv::Mat output_img = image.clone();
  for (size_t i = 0; i < nms_indices.size(); i++) {
    auto idx = nms_indices[i];
    auto class_id = nms_class_ids[idx];
    auto scaled_box = ScaleBox(nms_boxes[idx], output_img.size(), 640);
    cv::rectangle(output_img, scaled_box, cv::Scalar(255, 0, 0));
    std::stringstream ss;
    ss << COCO_CATEGORIES[class_id] << ": " << nms_confidences[idx];
    cv::putText(output_img, ss.str(), scaled_box.tl() - cv::Point(0, 1),
                cv::FONT_HERSHEY_DUPLEX, 1, cv::Scalar(255, 0, 0));
  }
  cv::imwrite("result.jpg", output_img);

Release resources

1
2
3
4
5
  // Release Runtime
  rbln_destroy_runtime(rt);

  // Release Model
  rbln_destroy_model(mod);

How to build using CMake

The above code snippets are included in the RBLN Model Zoo C++ examples. To compile the code and create the executable binary, simply use these commands :

${SAMPLE_PATH} describes the path of the example application. (e.g., rbln-model-zoo/cpp/object_detection)

1
2
3
4
mkdir ${SAMPLE_PATH}/build
cd ${SAMPLE_PATH}/build
cmake ..
make

Note

As previously mentioned, our example application uses OpenCV APIs for image processing tasks. For this purpose, the CMake build system will fetch and install OpenCV directly from its source. Please note that this installation process may require over 5 minutes to complete, depending on your system's specifications and internet connection speed.

How to run Executable file

If you completed all steps, you can find executable file under cmake directory, named as object_detection and object_detectionn_async for synchronous and asynchronous inference, respectively.

  • Synchronous execution
    ${SAMPLE_PATH}/build/object_detection -i ${SAMPLE_PATH}/people4.jpg  -m ${SAMPLE_PATH}/yolov8m.rbln
    
  • ASynchronous execution
    ${SAMPLE_PATH}/build/object_detection_async -i ${SAMPLE_PATH}/people4.jpg  -m ${SAMPLE_PATH}/yolov8m.rbln
    

The output will be as follows :   Image