Skip to content

TroubleShoot

How to generate core dump file

If you encounter a problem while running PyTorch RBLN, please send the generated core dump file to client_support@rebellions.ai. To create a core dump file, you first need to remove the ulimit restrictions by the following command.

$ ulimit -c unlimited

Verify that the ulimit restrictions have been removed by running:

$ ulimit -c
unlimited

Re-run the problematic model script. When the error message occurs, a core dump file will be created under /var/crash.

1
2
3
$ ls /var/crash
-rw-r----- 1 rebel1    root   779026 Jul  2 17:50 /var/crash/_usr_bin_python3.10.2029.crash
-rw-r----- 1 rebel2    root 94849351 Jun 25 18:27 /var/crash/_usr_bin_python3.10.2035.crash

Logging Operators Running on CPU

When a PyTorch operator or a specific data type is encountered that is not yet supported by PyTorch RBLN, the operation is executed on the CPU instead to ensure seamless execution.

While this feature enhances model compatibility, these operations do not leverage the performance benefits of the NPU. Therefore, it is crucial to identify which operations are falling back to the CPU during the optimization process.

By default, the PyTorch RBLN log level is set to WARNING, so debugging (DEBUG) messages are not displayed. Therefore, to identify all operators running on the CPU for NPU performance optimization, you must explicitly set the environment variable to DEBUG as shown below.

Usage:

$ export TORCH_RBLN_LOG_LEVEL=DEBUG

To set it back to the default value, set the environment variable as follows.

$ export TORCH_RBLN_LOG_LEVEL=WARNING

Example Output:

With this environment, running a model in Eager Mode will print a log, as shown below, whenever an operation is performed on the CPU instead of the Rebellions NPU, containing the operator's name and (if traceable) the source code location.

1
2
3
[TORCH-RBLN][DEBUG] 'aten::pow' ran on CPU instead of RBLN
/transformers/models/llama/modeling_llama.py:73: UserWarning: TRACE
  variance = hidden_states.pow(2).mean(-1, keepdim=True)