Troubleshoot¶

How to generate `Debug Dump Binaries (DDB)`¶

The DDB contains useful information for functional debugging of the RBLN NPU, such as the input of the RBLN Compiler, the error log of each compile pass, and the progress status of the compilation. Note that all DDB files are securely encrypted.

You can generate the DDB by setting the environment variable RBLN_DEBUG_LEVEL:

RBLN_DEBUG_LEVEL=1: DDB generation, without model parameters
RBLN_DEBUG_LEVEL=2: DDB generation, including model parameters

Setting RBLN_DEBUG_LEVEL=2 is better for debugging, but if it is not possible to share the model parameters, setting RBLN_DEBUG_LEVEL=1 is a suitable option.

Here is an example of how to generate the DDB for the PyTorch ResNet50 model in the RBLN Model Zoo:

$ cd rbln_model_zoo/pytorch/torchvisions
$ RBLN_DEBUG_LEVEL=2 python3 main.py --model_name resnet50
$ ls ./debug_mm_dd_yyyy_hh_mm_ss/

You can see:

1	`0_graph.json.gz.enc 100_graph.json.gz.enc error_log.txt.enc progress.txt.enc`

We recommend that you create a tar ball containing all DDB files and submit it via RBLN Portal > Technical Supports with detailed descriptions for further assistance:

$ tar -zcvf debug_mm_dd_yyyy_hh_mm_ss.tar.gz debug_mm_dd_yyyy_hh_mm_ss/

Performance Tuning¶

While most of the compiled model consists of NPU operations, some CPU-based operations may still be present. The performance of these operations can vary based on the CPU host environment.

To improve the perfromance of the CPU-based operations, you can adjust the number of threads using the following methods.

1. Adjusting the Number of Threads¶

Option 1. Using an Environment Variable

Set the number of threads by defining the environment variable before running the model:

# Export the environment variable
$ export RBLN_NUM_THREADS=<number_of_threads>

# Run your model script
$ python run_model.py

Alternatively, you can set it in a single line:

$ RBLN_NUM_THREADS=<number_of_threads> python run_model.py

Option 2. Modifying the Runtime Property

You can also set the number of threads directly using the Python runtime API:

module = rebel.Runtime(f"{rbln_file_name}.rbln")
module.num_threads = <number_of_threads>

2. Determining the Optimal Number of Threads¶

The optimal number of threads depends on your CPU host environment. While you can manually adjust it, we provide a utility function, search_num_threads(), to automate the process:

from rebel.core.tools import search_num_threads

module = rebel.Runtime(f"{rbln_file_name}.rbln")
search_num_threads(module)

This function benchmarks different thread counts and prints the average execution time, helping you identify the most efficient thread count for your CPU host environment.

INFO [rebel-compiler] Max Concurrency = 48
INFO [rebel-compiler] Current num threads = 24
INFO [rebel-compiler] Testing with 1 threads.
INFO [rebel-compiler] 200 runs: Average execution time = 143.10 µs
INFO [rebel-compiler] Testing with 2 threads.
INFO [rebel-compiler] 200 runs: Average execution time = 80.15 µs
INFO [rebel-compiler] Testing with 4 threads.
INFO [rebel-compiler] 200 runs: Average execution time = 49.03 µs
INFO [rebel-compiler] Testing with 8 threads.
INFO [rebel-compiler] 200 runs: Average execution time = 43.47 µs
INFO [rebel-compiler] Testing with 16 threads.
INFO [rebel-compiler] 200 runs: Average execution time = 39.80 µs
INFO [rebel-compiler] Testing with 32 threads.
INFO [rebel-compiler] 200 runs: Average execution time = 44.66 µs

In this example, 16 threads provide the best performance. The optimal thread count may vary depending on your CPU host system, so we recommend running this benchmark to find the best value.

Troubleshoot¶

How to generate Debug Dump Binaries (DDB)¶

Performance Tuning¶

1. Adjusting the Number of Threads¶

2. Determining the Optimal Number of Threads¶

How to generate `Debug Dump Binaries (DDB)`¶