Troubleshoot¶
How to generate Debug Dump Binaries (DDB)
¶
The DDB
contains useful information for functional debugging of the RBLN NPU, such as the input of the RBLN Compiler, the error log of each compile pass, and the progress status of the compilation. Note that all DDB
files are securely encrypted.
You can generate the DDB
by setting the environment variable RBLN_DEBUG_LEVEL
:
RBLN_DEBUG_LEVEL=1
:DDB
generation, without model parametersRBLN_DEBUG_LEVEL=2
:DDB
generation, including model parameters
Setting RBLN_DEBUG_LEVEL=2
is better for debugging, but if it is not possible to share the model parameters, setting RBLN_DEBUG_LEVEL=1
is a suitable option.
Here is an example of how to generate the DDB
for the PyTorch ResNet50
model in the RBLN Model Zoo:
You can see:
We recommend that you create a tar ball containing all DDB
files and submit it via RBLN Portal > Technical Supports with detailed descriptions for further assistance:
Performance Tuning¶
While most of the compiled model consists of NPU operations, some CPU-based operations may still be present. The performance of these operations can vary based on the CPU host environment.
To improve the perfromance of the CPU-based operations, you can adjust the number of threads using the following methods.
1. Adjusting the Number of Threads¶
Option 1. Using an Environment Variable
Set the number of threads by defining the environment variable before running the model:
Option 2. Modifying the Runtime Property
You can also set the number of threads directly using the Python runtime API:
2. Determining the Optimal Number of Threads¶
The optimal number of threads depends on your CPU host environment. While you can manually adjust it, we provide a utility function, search_num_threads()
, to automate the process:
In this example, 16 threads provide the best performance. The optimal thread count may vary dependiong on your CPU host system, so we recommend running this benchmark to find the best value.