Skip to content

Image Classification using C/C++ APIs

This tutorial introduces how to deploy a PyTorch ResNet50 model using the RBLN SDK C/C++ API. The model is compiled using the RBLN SDK Python API, and the resulting *.rbln file is deployed using the RBLN SDK C/C++ API.

This approach combines the ease of model preparation in Python with the performance benefits of C/C++ for inference. The entire code used in this tutorial can be found in RBLN Model Zoo.

The tutorial is divided into two parts:

  1. How to compile the PyTorch ResNet50 and save the compiled model with Python API
  2. How to deploy the compiled model in the C/C++ runtime-based inference environment

Prerequisites

Before we start, please make sure you have installed the following packages in your system:

Step 1. How to compile

While the RBLN Python API offers comprehensive functionality, capable of handling both compilation and inference processes within the RBLN SDK, the RBLN SDK C/C++ API is specifically designed and optimized for inference operations only.

In this tutorial, we'll use the RBLN Python API for model compilation and RBLN C/C++ API for inference.

Prepare the model

First, we can import the ResNet50 model from the TorchVision library.

1
2
3
4
5
6
7
from torchvision import models
import rebel
import torch

model_name = "resnet50"
weights = models.get_model_weights(model_name).DEFAULT
model = getattr(models, model_name)(weights=weights).eval()

Compile the model

Once a torch model torch.nn.Module is prepared, we can simply compile it with the rebel.compile_from_torch() method.

# Compile the model
compiled_model = rebel.compile_from_torch(model, [("x", [1, 3, 224, 224], "float32")], npu="RBLN-CA02")

If the NPU is installed on your host machine, you can omit the npu argument in the rebel.compile_from_torch() function. In this case, the function will automatically detect and use the installed NPU. However, if the NPU is not installed on your host machine, you need to specify the target NPU using the npu argument to avoid any errors.

Currently, there are two supported NPU names: RBLN-CA02, RBLN-CA12. If you are unsure about the name of your target NPU, you can check it by running the rbln-stat command in the command line on the host machine where the NPU is installed.

Save the compiled model

To save the compiled model in your local storage, you can utilize the compiled_model.save() method. This function allows you to store the compiled model for deployment. Here's how you can implement this step:

# Save the compiled model to local storage
compiled_model.save(f"{model_name}.rbln")   # model_name = resnet50

Complete compilation

The above compilation code snippets are included in the compile.py. To compile the model and generate the *.rbln file, execute the compile.py with the following command:

python compile.py --model-name=resnet50

Upon successful completion of this process, you will find resnet50.rbln in your local storage. This file encapsulates the compiled ResNet50 model, ready for deployment using the RBLN SDK C/C++ API.

Step 2. How to deploy using RBLN SDK C/C++ API

Now, we can deploy the model using the RBLN SDK C/C++ API to load the compiled model, run inference, and check the output results.

Prepare CMake build script

This tutorial uses the OpenCV for image pre/post-processing, and the argparse to parse user parameters from the command-line interface (CLI). The following CMake script describes the dependencies on external packages and how to link them with our example application code.

# Define dependencies for external Package
include(FetchContent)
include(cmake/opencv.cmake)
include(cmake/argparse.cmake)

# Define the name of executable
add_executable(image_classification main.cc)

# Update link info for package dependencies: OpenCV
find_package(OpenCV CONFIG REQUIRED)
target_link_libraries(image_classification ${OpenCV_LIBS})

# Update link info for dependencies: RBLN
find_package(rbln CONFIG REQUIRED)
target_link_libraries(image_classification rbln::rbln_runtime)

# Update including dependencies: argparse
target_include_directories(image_classification PRIVATE ${argparse_INCLUDE_DIRS})

Prepare the input

We need to prepare the preprocessed image as input data required for the pre-trained ResNet50 model. Here, we will perform preprocessing on the input image using various vision APIs provided by OpenCV.

  std::string input_path = "${SAMPLE_PATH}/tabby.jpg";

  // Preprocessing images
  cv::Mat input_image;
  try {
    input_image = cv::imread(input_path);
  }
  catch (const cv::Exception &err) {
    std::cerr << err.what() << std::endl;
    std::exit(1);
  }
  cv::Mat image;
  cv::cvtColor(input_image, image, cv::COLOR_BGR2RGB);

  // Resizing with aspect ratio
  float scale = image.rows < image.cols ? 256. / image.rows : 256. / image.cols;
  cv::resize(image, image, cv::Size(), scale, scale, cv::INTER_LINEAR);

  // Cropping image at center position
  image = image(cv::Rect((image.cols - 224) / 2, (image.rows - 224) / 2, 224, 224));

  // Image Normalization
  image.convertTo(image, CV_32F);
  cv::Vec3f mean(123.68, 116.28, 103.53);
  cv::Vec3f std(58.395, 57.120, 57.385);
  for (unsigned i = 0; i < image.rows; i++) {
    for (unsigned j = 0; j < image.cols; j++) {
      cv::subtract(image.at<cv::Vec3f>(i, j), mean, image.at<cv::Vec3f>(i, j));
      cv::divide(image.at<cv::Vec3f>(i, j), std, image.at<cv::Vec3f>(i, j));
    }
  }

  // Image conversion to CV matrix
  cv::Mat blob = cv::dnn::blobFromImage(image);

Run inference

The RBLN SDK C/C++ API supports both synchronous and asynchronous inference methods. For descriptions of the simplified APIs, please refer to the information below.

The RBLN API rbln_create_model is used to load the compiled model by passing the path of the saved model as an input argument.

We can use rbln_create_runtime to create a synchronous runtime from RBLNModel, module name, and device ID. For asynchronous operation, you can create an asynchronous runtime by passing the same arguments used for the synchronous runtime to rbln_create_async_runtime.

To assign the input image at runtime, we use rbln_set_input. This API takes as arguments the RBLNRuntime, the index of the input buffer, and the address of the preprocessed buffer. This API is applicable only for synchronous operations.

After all inputs have been updated, we can perform synchronized inference by calling rbln_run with RBLNRuntime as an argument. For asynchronous operations, asynchronous inference can be performed by passing input and output buffers to rbln_async_run.

Finally, we can retrieve the output buffer containing the inference results using rbln_get_output. This API takes RBLNRuntime and the output index as arguments. For asynchronous operations, since input and output buffers were passed when calling rbln_run, you can directly reference these output buffers.

For the required API usage in each inference mode, please refer to the following two examples:

  • Synchronous Execution

      std::string model_path = "${SAMPLE_PATH}/resnet50.rbln";
    
      RBLNModel *mod = rbln_create_model(model_path.c_str());
      RBLNRuntime *rt = rbln_create_runtime(mod, "default", 0);
    
      // Set input data
      rbln_set_input(rt, 0, blob.data);
    
      // Run sync inference
      rbln_run(rt);
    
      // Get output results
      float *logits = static_cast<float *>(rbln_get_output(rt, 0));
    

  • Asynchronous Execution

      std::string model_path = "${SAMPLE_PATH}/resnet50.rbln";
    
      RBLNModel *mod = rbln_create_model(model_path.c_str());
      RBLNRuntime *rt = rbln_create_async_runtime(mod, "default", 0);
    
      // Alloc output buffer
      auto buf_size = rbln_get_layout_nbytes(rbln_get_output_layout(rt, 0));
      std::vector<float> logits(buf_size/sizeof(float));
    
      // Run async inference
      int rid = rbln_async_run(rt, blob.data, logits.data());
    
      // Wait inference done
      rbln_async_wait(rt, rid, 1000);
    

Post Processing

The output logits is a float32 data array with a size of (1, 1000), where each element represents the score of the corresponding category in the ImageNet dataset. We can derive the top-1 index from these logits, and use this top-1 index to retrieve the corresponding category from a pre-defined Top1 classes.

  // Postprocessing
  size_t max_idx = 0;
  float max_val = std::numeric_limits<float>::min();
  for (size_t i = 0; i < 1000; i++) {
    if (logits[i] > max_val) {
      max_val = logits[i];
      max_idx = i;
    }
  }

  // Print categorized output
  std::cout << "Predicted category: " << IMAGENET_CATEGORIES[max_idx] << std::endl;

Release resources

1
2
3
4
5
  // Release Runtime
  rbln_destroy_runtime(rt);

  // Release Model
  rbln_destroy_model(mod);

How to build using CMake

The complete code for the above API example is included in the RBLN Model Zoo C++ examples. You can easily compile the code and generate the executable binary with the following commands:

${SAMPLE_PATH} describes the path of the example application. (e.g., rbln-model-zoo/cpp/image_classification)

1
2
3
4
mkdir ${SAMPLE_PATH}/build
cd ${SAMPLE_PATH}/build
cmake ..
make

Note

As previously mentioned, our example application uses OpenCV APIs for image processing tasks. For this purpose, the CMake build system will fetch and install OpenCV directly from its source. Please note that this installation process may require over 5 minutes to complete, depending on your system's specifications and internet connection speed.

How to run Executable file

If you completed all steps on above, you can find executable binary under the cmake directory, named as image_classification and image_classification_async for synchronous and asynchronous inference, respectively.

  • Synchronous Execution
    ${SAMPLE_PATH}/build/image_classification -i ${SAMPLE_PATH}/tabby.jpg  -m ${SAMPLE_PATH}/resnet50.rbln
    
  • ASynchronous Execution
    ${SAMPLE_PATH}/build/image_classification_async -i ${SAMPLE_PATH}/tabby.jpg  -m ${SAMPLE_PATH}/resnet50.rbln
    

The results will look like this:

Predicted category: tabby