Python API¶

`RBLN compile API`¶

Classes¶

`RBLNCompiledModel` ¶

A class that holds the compile binaries. This class is generated by compile_from_* functions.

Functions¶

`save(path)` ¶

Serialize and save the model to the disk, .rbln formatted file.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to save serialized data	required

`create_runtime(*, device=0, tensor_type='np')` ¶

Create runtime with this binary. Note that this function is exclusive to create_async_runtime. Once you create a runtime by create_runtime with the instance, you can't call create_async_runtime.

Parameters:

Name	Type	Description	Default
`device`	`int`	The device ID of the NPU to use for execution. Defaults to 0.	`0`
`tensor_type`	`str`	The object type of the tensor used in the `run` function. Possible values are: "np": Uses np.ndarray type. "pt": Uses torch.Tensor type. Defaults to "np".	`'np'`

Returns:

Type	Description
`Runtime`	Runtime object that can be run on the RBLN ATOM™

`create_async_runtime(*, device=0, tensor_type='np', parallel=None)` ¶

Create asynchronous version of runtime with this binary. Note that this function is exclusive to create_runtime. Once you create an asynchronous runtime by create_async_runtime with the instance, you can't call create_runtime.

Parameters:

Name	Type	Description	Default
`device`	`int`	The device ID of the NPU to use for execution. Defaults to 0.	`0`
`tensor_type`	`str`	The object type of the tensor used in the `run` function. Possible values are: "np": Uses np.ndarray type. "pt": Uses torch.Tensor type. Defaults to "np".	`'np'`
`parallel`	`int`	The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are: 1: Uses a single thread to prepare inputs (default). 2: Uses two threads to prepare inputs, enabling double buffering. This can potentially improve performance when input preparation is time-consuming, as one thread can prepare the next input while the NPU is still processing the current one. This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature. Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit.	`None`

Returns:

Type	Description
`AsyncRuntime`	Asynchronous runtime object that can be run on the RBLN ATOM™

`get_total_device_alloc(parallel=1)` ¶

Retrieves the total device memory allocation (in bytes) required for the compiled graph across all NPUs.

Parameters:

Name	Type	Description	Default
`parallel`	`int`	If parallel > 1, the returned allocation size accounts for additional buffer requirements when operating in non-blocking (asynchronous) mode. Defaults to 1.	`1`

Returns:

Name	Type	Description
`int`	`int`	The total device memory allocation (in bytes) required for the compiled graph.

`get_alloc_per_node(parallel=1)` ¶

Retrieves the device memory allocation (in bytes) required for the compiled graph on each individual NPU.

Parameters:

Name	Type	Description	Default
`parallel`	`int`	If parallel > 1, the returned allocation size accounts for additional buffer requirements when operating in non-blocking (asynchronous) mode. Defaults to 1.	`1`

Returns:

Type	Description
`List[int]`	List[int]: A list containing the device memory allocation (in bytes) for each NPU. The length of the list corresponds to the number of NPUs used for tensor parallelism.

`inspect(path)` `classmethod` ¶

Provides metadata information about the compiled model without loading it into host memory.

This method retrieves essential details such as:

Estimated memory usage per device
Compiler version used for compilation
I/O Tensor's name, shape, and dtype
Name of the compiled model
Target NPU type
Tensor parallel size (number of required devices)
Unique identifier (UUID) for the compiled model
Reasons for graph breaks (If device_func_count == 1, it means no graph breaks occured)

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to the compiled RBLN model file.	required

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: Dictionary containing metadata information about the compiled model.

Example

from rebel import RBLNCompiledModel

metadata = RBLNCompiledModel.inspect("/path/to/compiled/model.rbln")
print(metadata)

Output

{
    'alloc_per_node': [8002732032, 7969177600, 7969177600, 7969177600],
    'compiler_version': '0.7.2',
    'subgraph': {
        'stats': [
        {
            'compile_id': 0,
            'device_func_count': 2,
            'host_op_reason': {
            'rbln_rebel_descriptor_main_0': {
                'aten::clamp_min_0': 'BreakReason.WrongShape',
                'aten::lt_0': 'BreakReason.Unsupported',
                'aten::sum_0': 'BreakReason.WrongShape',
                'aten::to_0': 'BreakReason.WrongShape',
                'aten::to_1': 'BreakReason.WrongShape'
            },
            'rbln_rebel_descriptor_main_6': {
                'aten::clamp_min_0': 'BreakReason.WrongShape',
                'aten::cumsum_0': 'BreakReason.Unsupported',
                'aten::lt_0': 'BreakReason.Unsupported',
                'aten::lt_1': 'BreakReason.Unsupported',
                'aten::sum_0': 'BreakReason.WrongShape',
                'aten::to_0': 'BreakReason.WrongShape',
                'aten::to_1': 'BreakReason.WrongShape',
                'aten::to_2': 'BreakReason.WrongShape',
                'aten::transpose_2': 'BreakReason.WrongShape'
            }
            }
        }
        ]
    },
    'inputs': [
        {
            'dtype': 'float32',
            'name': 'inputs_embeds',
            'shape': [1, 128, 4096]
        },
        {
            'dtype': 'float32',
            'name': 'attention_mask',
            'shape': [1, 1, 128, 32768]
        },
    ],
    'name': '3c74b1',
    'npu': 'RBLN-CA25',
    'outputs': [
        {
            'dtype': 'float32',
            'shape': [1, 1, 32064]
        }
    ],
    'tensor_parallel_size': 4,
    'uuid': '5b8183b8e75e11efa2ebece7a7056f04'
}

Functions¶

`compile_from_torch(mod, input_info=None, example_inputs=None, *, npu=None, tensor_parallel_size=None)` ¶

Compile a model from torch.nn.Module.

Parameters:

Name	Type	Description	Default
`mod`	`Module`	A pytorch model (torch.nn.Module)	required
`input_info`	`List[Tuple[str, List[int], DType]]`	Input configuration information. Supports two formats: Single Input Configuration: A list of input information, with each information described in a triple format (name, shape, dtype). name : `str` - Name of the input tensor shape : List[int] - Shape of the input tensor dtype : `str` or `torch.dtype` (i.e., "float32" or torch.float32) Bucketing (Multiple Input Configurations): A list of input configurations, where each configuration is itself a list of input information. This enables the compiled model to support multiple input shapes efficiently within a single runtime, allowing automatic bucket selection based on input shape at inference time. For detailed bucketing examples and best practices, see bucketing tutorial.	`None`
`example_inputs`	`List[Tensor]`	A list of example input torch tensors that can be used for tracing. If None, tracing may use default dummy inputs obtained from input_info.	`None`
`npu`	`str`	The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None.	`None`
`tensor_parallel_size`	`int`	The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None.	`None`

Returns:

Type	Description
`RBLNCompiledModel`	Compiled model that can be run on the RBLN NPU

`compile_from_torchscript(mod, input_names=None, *, npu=None, tensor_parallel_size=None)` ¶

Compile a model from torch.jit.ScriptModule, a result of torch.jit.trace function.

Note that the input shape & dtype informations should be kept to compile torchscript model. When loading a torchscript model via torch.jit.load, these information are skipped from recover by default. You should explicitly add additional parameter _restore_shapes=True to recover these informations.

mod = torch.jit.load("model.pt", _restore_shapes=True)
compile_from_torchscript(mod)

Parameters:

Name	Type	Description	Default
`mod`	`TorchScript`	A pytorch jit-traced model.	required
`input_names`	`Optional[List[Optional[str]]]`	A list of input names. If `input_names` is specified as `None`, it'll derive name from `mod`, name of corresponding input in `forward` function.	`None`
`npu`	`str`	The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None.	`None`
`tensor_parallel_size`	`int`	The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None.	`None`

Returns:

Type	Description
`RBLNCompiledModel`	Compiled model that can be run on the RBLN NPU

`compile_from_tf_function(func, input_info, outputs=None, layout='NHWC', *, npu=None)` ¶

Compile a model from tf.function. Note that the input function should not be concretized by get_concrete_function method.

Parameters:

Name	Type	Description	Default
`func`	`GenericFunction`	A tensorflow function	required
`input_info`	`List[Tuple[str, List[int], DType]]`	A list of input information, with each information described in triple format (name, shape, dtype). If the dtype is specified as `None`, it'll derive the dtype from `dtype` parameter.	required
`outputs`	`Optional[Union[str, List[str]]]`	A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs.	`None`
`layout`	`str`	Layout of the tensor used internally in the model. One of "NHWC" or "NCHW"	`'NHWC'`
`npu`	`str`	The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None.	`None`

Returns:

Type	Description
`RBLNCompiledModel`	Compiled model that can be run on the RBLN NPU

`compile_from_tf_graph_def(graph_def, outputs=None, layout='NHWC', *, npu=None)` ¶

Compile a model from TensorFlow GraphDef. This function allows you to compile TensorFlow V1.x legacy models. If you are using TensorFlow V2 as a default, we recommend compiling the model using compile_from_tf_function in its function form.

Parameters:

Name	Type	Description	Default
`graph_def`	`GraphDef`	A tensorflow graph definition in the form of a protocol buffer	required
`outputs`	`Optional[Union[str, List[str]]]`	A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs.	`None`
`layout`	`str`	Layout of the tensor used internally in the model. One of "NHWC" or "NCHW"	`'NHWC'`
`npu`	`str`	The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None.	`None`

Returns:

Type	Description
`RBLNCompiledModel`	Compiled model that can be run on the RBLN NPU

`torch.compile API`¶

Functions¶

`compile(model=None, *, dynamic=None, backend='inductor', options=None)` ¶

Optimizes the given model/function using TorchDynamo with the RBLN backend for execution on RBLN hardware.

This function compiles the input model to run efficiently on RBLN NPUs. It leverages TorchDynamo for tracing and the RBLN backend for generating optimized code tailored to RBLN hardware. To use this backend, ensure that the RBLN SDK is imported before calling torch.compile.

Parameters:

Name	Type	Description	Default
`model`	`Callable`	Module/function to optimize	`None`
`dynamic`	`bool or None`	Use dynamic shape tracing. The RBLN backend currently does not support dynamic shapes.	`None`
`backend`	`str or Callable`	backend to be used To use the RBLN backend, set this to `"rbln"` and ensure `import rebel` is executed before calling `torch.compile`.	`'inductor'`
`options`	`dict`	A dictionary of options to pass to the backend. Some notable ones to try out for rbln backend are `cache_dir` which specifies the directory where compiled artifacts should be stored. `mode` which specifies the mode of compilation. If set to 'strict,' the model is compiled in strict mode, ensuring that it is fully exported as required. If None, the compilation failure automatically triggers eager mode. `npu` which specifies the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If no NPU is installed on the host machine, an error will be raised. Defaults to None. `device` which specifies the device ID of the NPU to use for execution. Defaults to 0.	`None`

Example

compiled_model = torch.compile(model,
                               backend="rbln",  # Specify the RBLN backend
                               options={"cache_dir": "./rbln_cache_dir"},  # Cache directory for compiled artifacts
                               dynamic=False)  # Disable dynamic shapes (not supported by RBLN backend)

`runtime API`¶

Classes¶

`RuntimeBase` ¶

A base class for runtime, providing common functionalities for both synchronous and asynchronous runtime.

The RuntimeBase class serves as a foundational class for the Runtime and AsyncRuntime classes. It encapsulates shared methods and attributes that manage the execution of the model.

Attributes¶

`num_threads` `property` `writable` ¶

The current number of threads used by the system.

Returns:

Name	Type	Description
`int`	`int`	the current number of threads.

Functions¶

`model_description()` ¶

Returns a description of the model currently loaded in the runtime.

This method provides a summary of the model's architecture, including details about its inputs, outputs, and memory usage on the RBLN device.

Returns:

Name	Type	Description
`str`	`str`	A string containing the model's description.

`flush_reports()` ¶

Fetch and discard all pending reports from the runtime.

This method streams all available reports using _stream_reports() and then clears the internal _reports list.

`get_reports()` ¶

Retrieve all pending reports from the runtime.

Returns:

Type	Description
`List[Dict[str, Any]]`	List[str]: A list of report dictionaries in string format.

This method fetches all available reports and returns their contents while preserving their order.

Note

Timer reports are only available when the environment variable RBLN_RUNTIME_TIMER=1 is set. Without this setting, timer reports will not be generated.

Example

# Before the runtime timer reports, the runtime must be executed.
result = runtime.run(input)

# One dictionary report is included per runtime execution.
print(runtime.get_reports())

Output

1
2
3

[{'type': 'timer', 'default_fused_nn_pad_1': 1, 'rbln_rebel_descriptor_main_2': 922,
'default_fused_transpose': 34, 'default_fused_nn_pad': 1358, 'total': 2315,
'total_host': 1393, 'total_device': 922}]

`get_elapsed_times()` ¶

Calculate the average total execution time of operations in microseconds.

Returns:

Name	Type	Description
`float`	`float`	Average execution time in microseconds, or 0.0 if no timer reports are available.

This method iterates over the available reports, sums the total values for reports, and computes the average across all such reports.

Note

Timer reports are only available when the environment variable RBLN_RUNTIME_TIMER=1 is set. Without this setting, timer reports will not be generated.

`get_elapsed_device_times()` ¶

Calculate the average device-side execution time of operations in microseconds.

Returns:

Name	Type	Description
`float`	`float`	Average device execution time in microseconds, or 0.0 if no timer reports are available.

This method iterates over the available reports, sums the total_device values for reports, and computes the average across all such reports.

Note

Timer reports are only available when the environment variable RBLN_RUNTIME_TIMER=1 is set. Without this setting, timer reports will not be generated.

`Runtime` ¶

A Runtime object for executing a compiled neural network on an NPU.

Functions¶

`init(compiled_model, *, device=0, tensor_type='np', activate_profiler=False, timeout=None)` ¶

Initializes a Runtime object for executing a compiled neural network on an NPU.

Parameters:

Name	Type	Description	Default
`compiled_model`	`Union[str, RBLNCompiledModel]`	The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel.	required
`device`	`int`	The device ID of the NPU to use for execution. Defaults to 0.	`0`
`tensor_type`	`str`	The object type of the tensor used in the `run` function. Possible values are: "np": Uses np.ndarray type. "pt": Uses torch.Tensor type. Defaults to "np".	`'np'`
`activate_profiler`	`bool`	Whether to activate profiling for this runtime instance. If set to `True`, profiling information (e.g., execution times) will be collected during execution. Useful for debugging and performance tuning. Defaults to `False`.	`False`
`timeout`	`int`	The timeout for the runtime in seconds. If the runtime exceeds this timeout, it will be terminated. If not provided, the runtime will use the default timeout value (60 seconds).	`None`

`run(*input_args, out=None, **input_kwargs)` ¶

Runs the compiled neural network with the given input tensors.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`out`	`Optional[List[Union[ndarray, Tensor]]]`	An optional list or tensor to store the output tensors. If provided, it must contain pre-allocated tensors with shapes matching the network's output shapes. If not provided or set to `None`, new tensors will be allocated to store the outputs.	`None`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the Runtime object's initialization.

`forward(*input_args, out=None, **input_kwargs)` ¶

An alias for the run method.

This method is provided for compatibility with PyTorch's naming convention.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`out`	`Optional[List[Union[ndarray, Tensor]]]`	An optional list or tensor to store the output tensors. If provided, it must contain pre-allocated tensors with shapes matching the network's output shapes. If not provided or set to `None`, new tensors will be allocated to store the outputs.	`None`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network, as returned by the `run` method.

`call(*input_args, out=None, **input_kwargs)` ¶

Allows the Runtime object to be called as a function.

This method is provided for convenience and compatibility with common neural network frameworks.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`out`	`Optional[List[Union[ndarray, Tensor]]]`	An optional list or tensor to store the output tensors. If provided, it must contain pre-allocated tensors with shapes matching the network's output shapes. If not provided or set to `None`, new tensors will be allocated to store the outputs.	`None`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network, as returned by the `run` method.

`AsyncRuntime` ¶

An AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.

Functions¶

`init(compiled_model, *, device=0, tensor_type='np', parallel=None, timeout=None)` ¶

Initializes an AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.

Parameters:

Name	Type	Description	Default
`compiled_model`	`Union[str, RBLNCompiledModel]`	The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel.	required
`device`	`int`	The device ID of the NPU to use for execution. Defaults to 0.	`0`
`tensor_type`	`str`	The object type of the tensor used in the `run` function. Possible values are: "np": Uses np.ndarray type. "pt": Uses torch.Tensor type. Defaults to "np".	`'np'`
`parallel`	`int`	The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are: 1: Uses a single thread to prepare inputs (default). 2: Uses two threads to prepare inputs, enabling double buffering. This can potentially improve performance when input preparation is time-consuming, as one thread can prepare the next input while the NPU is still processing the current one. This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature. Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit.	`None`
`timeout`	`int`	The timeout for the runtime in seconds. If the runtime exceeds this timeout, it will be terminated. If not provided, the runtime will use the default timeout value (60 seconds).	`None`

`run(*input_args, **input_kwargs)` ¶

Runs the compiled neural network asynchronously with the given input tensors.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`AsyncTask`	An asynchronous task object representing the neural network execution. The task object can be used to wait for the neural network execution to finish.

`async_run(*input_args, **input_kwargs)` `async` ¶

Runs the compiled neural network asynchronously and returns the result awaitably.

This method is a coroutine that can be used with the await keyword to asynchronously run the neural network and retrieve the result.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network. The return type depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization.

`AsyncTask` ¶

Functions¶

`wait(timeout=None)` ¶

Waits for the asynchronous task to complete and returns the result.

This method blocks the calling thread until the task is completed or the specified timeout is reached.

Parameters:

Name	Type	Description	Default
`timeout`	`Optional[float]`	The maximum amount of time (in seconds) to wait for the task to complete. If None, the method will wait indefinitely until the task is completed. Defaults to None.	`None`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization.

`utility API`¶

Functions¶

`npu_is_available(device=0)` ¶

Return a bool indicating whether the RBLN device is currently available.

Parameters:

Name	Type	Description	Default
`device`	`int`	Index of the npu. Defaults to 0.	`0`

Returns:

Type	Description
`bool`	A bool indicating whether the RBLN device is currently available

`get_npu_name(device=0)` ¶

Return the name of RBLN npu.

Parameters:

Name	Type	Description	Default
`device`	`int`	Index of the npu. Defaults to 0.	`0`

Returns:

Type	Description
`str`	Corresponding name of the npu. e.g. "RBLN-CA12"

`search_num_threads(runtime, max_iterations=200)` ¶

Benchmarks different thread counts and prints the average execution time for each configuration.

This function tests various thread counts by adjusting the num_threads property of the given runtime instance. For each configuration, it measures the average host execution time over a specified number of iterations and prints the results.

Parameters:

Name	Type	Description	Default
`runtime`	`Runtime`	The runtime instance whose thread count will be tested.	required
`max_iterations`	`int`	The number of iterations to run for each thread count. A higher value provides more accurate results. Default is 200.	`200`

Example

import rebel
from rebel.core.tools import search_num_threads
runtime = rebel.Runtime("model.rbln")
search_num_threads(runtime)

Output

INFO [rebel-compiler] Testing with 1 thread.
INFO [rebel-compiler] 200 runs: Average execution time = 143.10 µs
INFO [rebel-compiler] Testing with 2 threads.
INFO [rebel-compiler] 200 runs: Average execution time = 80.15 µs
...

`device_count()` ¶

Return the number of NPUs available.

Returns:

Type	Description
`int`	The number of NPUs available.

Python API¶

RBLN compile API¶

Classes¶

RBLNCompiledModel ¶

Functions¶

save(path) ¶

create_runtime(*, device=0, tensor_type='np') ¶

create_async_runtime(*, device=0, tensor_type='np', parallel=None) ¶

get_total_device_alloc(parallel=1) ¶

get_alloc_per_node(parallel=1) ¶

inspect(path) classmethod ¶

Functions¶

compile_from_torch(mod, input_info=None, example_inputs=None, *, npu=None, tensor_parallel_size=None) ¶

compile_from_torchscript(mod, input_names=None, *, npu=None, tensor_parallel_size=None) ¶

compile_from_tf_function(func, input_info, outputs=None, layout='NHWC', *, npu=None) ¶

compile_from_tf_graph_def(graph_def, outputs=None, layout='NHWC', *, npu=None) ¶

torch.compile API¶

Functions¶

compile(model=None, *, dynamic=None, backend='inductor', options=None) ¶

runtime API¶

Classes¶

RuntimeBase ¶

Attributes¶

num_threads property writable ¶

Functions¶

model_description() ¶

flush_reports() ¶

get_reports() ¶

get_elapsed_times() ¶

get_elapsed_device_times() ¶

Runtime ¶

Functions¶

__init__(compiled_model, *, device=0, tensor_type='np', activate_profiler=False, timeout=None) ¶

run(*input_args, out=None, **input_kwargs) ¶

forward(*input_args, out=None, **input_kwargs) ¶

__call__(*input_args, out=None, **input_kwargs) ¶

AsyncRuntime ¶

Functions¶

__init__(compiled_model, *, device=0, tensor_type='np', parallel=None, timeout=None) ¶

run(*input_args, **input_kwargs) ¶

async_run(*input_args, **input_kwargs) async ¶

AsyncTask ¶

Functions¶

wait(timeout=None) ¶

utility API¶

Functions¶

npu_is_available(device=0) ¶

get_npu_name(device=0) ¶

search_num_threads(runtime, max_iterations=200) ¶

device_count() ¶

`RBLN compile API`¶

`RBLNCompiledModel` ¶

`save(path)` ¶

`create_runtime(*, device=0, tensor_type='np')` ¶

`create_async_runtime(*, device=0, tensor_type='np', parallel=None)` ¶

`get_total_device_alloc(parallel=1)` ¶

`get_alloc_per_node(parallel=1)` ¶

`inspect(path)` `classmethod` ¶

`compile_from_torch(mod, input_info=None, example_inputs=None, *, npu=None, tensor_parallel_size=None)` ¶

`compile_from_torchscript(mod, input_names=None, *, npu=None, tensor_parallel_size=None)` ¶

`compile_from_tf_function(func, input_info, outputs=None, layout='NHWC', *, npu=None)` ¶

`compile_from_tf_graph_def(graph_def, outputs=None, layout='NHWC', *, npu=None)` ¶

`torch.compile API`¶

`compile(model=None, *, dynamic=None, backend='inductor', options=None)` ¶

`runtime API`¶

`RuntimeBase` ¶

`num_threads` `property` `writable` ¶

`model_description()` ¶

`flush_reports()` ¶

`get_reports()` ¶

`get_elapsed_times()` ¶

`get_elapsed_device_times()` ¶

`Runtime` ¶

`init(compiled_model, *, device=0, tensor_type='np', activate_profiler=False, timeout=None)` ¶

`run(*input_args, out=None, **input_kwargs)` ¶

`forward(*input_args, out=None, **input_kwargs)` ¶

`call(*input_args, out=None, **input_kwargs)` ¶

`AsyncRuntime` ¶

`init(compiled_model, *, device=0, tensor_type='np', parallel=None, timeout=None)` ¶

`run(*input_args, **input_kwargs)` ¶

`async_run(*input_args, **input_kwargs)` `async` ¶

`AsyncTask` ¶

`wait(timeout=None)` ¶

`utility API`¶

`npu_is_available(device=0)` ¶

`get_npu_name(device=0)` ¶

`search_num_threads(runtime, max_iterations=200)` ¶

`device_count()` ¶