Python API¶
RBLN compile API
¶
Classes¶
RBLNCompiledModel
¶
A class that holds the compile binaries.
This class is generated by compile_from_*
functions.
Functions¶
save(path)
¶
Serialize and save the model to the disk, .rbln
formatted file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
PathLike
|
Path to save serialized data |
required |
create_runtime(*, device=0, tensor_type='np')
¶
Create runtime with this binary.
Note that this function is exclusive to create_async_runtime
.
Once you create a runtime by create_runtime
with the instance, you can't call create_async_runtime
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
int
|
The device ID of the NPU to use for execution. Defaults to 0. |
0
|
tensor_type
|
str
|
The object type of the tensor used in the
Defaults to "np". |
'np'
|
Returns:
Type | Description |
---|---|
Runtime
|
Runtime object that can be run on the RBLN ATOM |
create_async_runtime(*, device=0, tensor_type='np', parallel=None)
¶
Create asynchronous version of runtime with this binary.
Note that this function is exclusive to create_runtime
.
Once you create an asynchronous runtime by create_async_runtime
with the instance, you can't call create_runtime
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
int
|
The device ID of the NPU to use for execution. Defaults to 0. |
0
|
tensor_type
|
str
|
The object type of the tensor used in the
Defaults to "np". |
'np'
|
parallel
|
int
|
The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are:
This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature. Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit. |
None
|
Returns:
Type | Description |
---|---|
AsyncRuntime
|
Asynchronous runtime object that can be run on the RBLN ATOM |
get_total_device_alloc(parallel=1)
¶
Retrieves the total device memory allocation (in bytes) required for the compiled graph across all NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parallel
|
int
|
If parallel > 1, the returned allocation size accounts for additional buffer requirements when operating in non-blocking (asynchronous) mode. Defaults to 1. |
1
|
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The total device memory allocation (in bytes) required for the compiled graph. |
get_alloc_per_node(parallel=1)
¶
Retrieves the device memory allocation (in bytes) required for the compiled graph on each individual NPU.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parallel
|
int
|
If parallel > 1, the returned allocation size accounts for additional buffer requirements when operating in non-blocking (asynchronous) mode. Defaults to 1. |
1
|
Returns:
Type | Description |
---|---|
List[int]
|
List[int]: A list containing the device memory allocation (in bytes) for each NPU. The length of the list corresponds to the number of NPUs used for tensor parallelism. |
inspect(path)
classmethod
¶
Provides metadata information about the compiled model without loading it into host memory.
This method retrieves essential details such as:
- Estimated memory usage per device
- Compiler version used for compilation
- I/O Tensor's name, shape, and dtype
- Name of the compiled model
- Target NPU type
- Tensor parallel size (number of required devices)
- Unique identifier (UUID) for the compiled model
- Reasons for graph breaks (If device_func_count == 1, it means no graph breaks occured)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
PathLike
|
Path to the compiled RBLN model file. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: Dictionary containing metadata information about the compiled model. |
Example:
Example Output:
Functions¶
compile_from_torch(mod, input_info=None, example_inputs=None, *, npu=None, tensor_parallel_size=None)
¶
Compile a model from torch.nn.Module
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mod
|
Module
|
A pytorch function |
required |
input_info
|
List[Tuple[str, List[int], DType]]
|
A list of input information, with each information described in a triple format (name, shape, dtype).
|
None
|
example_inputs
|
List[Tensor]
|
A list of example input torch tensors that can be used for tracing. If None, tracing may use default dummy inputs obtained from input_info. |
None
|
npu
|
str
|
The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None. |
None
|
tensor_parallel_size
|
int
|
The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
RBLNCompiledModel
|
Compiled model that can be run on the RBLN NPU |
compile_from_torchscript(mod, input_names=None, *, npu=None, tensor_parallel_size=None)
¶
Compile a model from torch.jit.ScriptModule
, a result of torch.jit.trace
function.
Note that the input shape & dtype informations should be kept to compile torchscript model.
When loading a torchscript model via torch.jit.load
, these information are skipped from
recover by default. You should explicitly add additional parameter _restore_shapes=True
to recover these informations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mod
|
TorchScript
|
A pytorch jit-traced model. |
required |
input_names
|
Optional[List[Optional[str]]]
|
A list of input names.
If |
None
|
npu
|
str
|
The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None. |
None
|
tensor_parallel_size
|
int
|
The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
RBLNCompiledModel
|
Compiled model that can be run on the RBLN NPU |
compile_from_tf_function(func, input_info, outputs=None, layout='NHWC', *, npu=None)
¶
Compile a model from tf.function
.
Note that the input function should not be concretized by get_concrete_function
method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
GenericFunction
|
A tensorflow function |
required |
input_info
|
List[Tuple[str, List[int], DType]]
|
A list of input information,
with each information described in triple format (name, shape, dtype).
If the dtype is specified as |
required |
outputs
|
Optional[Union[str, List[str]]]
|
A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs. |
None
|
layout
|
str
|
Layout of the tensor used internally in the model. One of "NHWC" or "NCHW" |
'NHWC'
|
npu
|
str
|
The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
RBLNCompiledModel
|
Compiled model that can be run on the RBLN NPU |
compile_from_tf_graph_def(graph_def, outputs=None, layout='NHWC', *, npu=None)
¶
Compile a model from TensorFlow GraphDef
.
This function allows you to compile TensorFlow V1.x legacy models.
If you are using TensorFlow V2 as a default, we recommend compiling the model
using compile_from_tf_function
in its function form.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph_def
|
GraphDef
|
A tensorflow graph definition in the form of a protocol buffer |
required |
outputs
|
Optional[Union[str, List[str]]]
|
A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs. |
None
|
layout
|
str
|
Layout of the tensor used internally in the model. One of "NHWC" or "NCHW" |
'NHWC'
|
npu
|
str
|
The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
RBLNCompiledModel
|
Compiled model that can be run on the RBLN NPU |
torch.compile API
¶
Functions¶
compile(model=None, *, dynamic=None, backend='inductor', options=None)
¶
Optimizes the given model/function using TorchDynamo with the RBLN backend for execution on RBLN hardware.
This function compiles the input model to run efficiently on RBLN NPUs. It leverages TorchDynamo for tracing and the RBLN backend for generating optimized code tailored to RBLN hardware. To use this backend, ensure that the RBLN SDK is imported before calling torch.compile.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Callable
|
Module/function to optimize |
None
|
dynamic
|
bool or None
|
Use dynamic shape tracing. The RBLN backend currently does not support dynamic shapes. |
None
|
backend
|
str or Callable
|
backend to be used
|
'inductor'
|
options
|
dict
|
A dictionary of options to pass to the backend. Some notable ones to try out for rbln backend are
|
None
|
Example:
1 2 3 4 |
|
runtime API
¶
Classes¶
RuntimeBase
¶
A base class for runtime, providing common functionalities for both synchronous and asynchronous runtime.
The RuntimeBase
class serves as a foundational class for the Runtime
and
AsyncRuntime
classes. It encapsulates shared methods and attributes
that manage the execution of the model.
Functions¶
model_description()
¶
Returns a description of the model currently loaded in the runtime.
This method provides a summary of the model's architecture, including details about its inputs, outputs, and memory usage on the RBLN device.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
A string containing the model's description. |
flush_reports()
¶
Fetch and discard all pending reports from the runtime.
This method streams all available reports using _stream_reports()
and then clears
the internal _reports
list.
get_reports()
¶
Retrieve all pending reports from the runtime.
Returns:
Type | Description |
---|---|
List[Dict[str, Any]]
|
List[str]: A list of report dictionaries in string format. |
This method fetches all available reports and returns their contents while preserving their order.
Note
Timer reports are only available when the environment variable RBLN_RUNTIME_TIMER=1
is set. Without this setting, timer reports will not be generated.
Example Output:
get_elapsed_times()
¶
Calculate the average total execution time of operations in microseconds.
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
Average execution time in microseconds, or 0.0 if no timer reports are available. |
This method iterates over the available reports, sums the total
values for reports,
and computes the average across all such reports.
Note
Timer reports are only available when the environment variable RBLN_RUNTIME_TIMER=1
is set. Without this setting, timer reports will not be generated.
get_elapsed_device_times()
¶
Calculate the average device-side execution time of operations in microseconds.
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
Average device execution time in microseconds, or 0.0 if no timer reports are available. |
This method iterates over the available reports, sums the total_device
values for reports,
and computes the average across all such reports.
Note
Timer reports are only available when the environment variable RBLN_RUNTIME_TIMER=1
is set. Without this setting, timer reports will not be generated.
Runtime
¶
A Runtime object for executing a compiled neural network on an NPU.
Functions¶
__init__(compiled_model, *, device=0, tensor_type='np', path=None, activate_profiler=False)
¶
Initializes a Runtime object for executing a compiled neural network on an NPU.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
compiled_model
|
Union[str, RBLNCompiledModel]
|
The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel. |
required |
device
|
int
|
The device ID of the NPU to use for execution. Defaults to 0. |
0
|
tensor_type
|
str
|
The object type of the tensor used in the
Defaults to "np". |
'np'
|
path
|
str
|
Deprecated. Use 'compiled_model' instead. |
None
|
activate_profiler
|
bool
|
Whether to activate profiling for this runtime instance.
If set to |
False
|
run(*input_args, out=None, **input_kwargs)
¶
Runs the compiled neural network with the given input tensors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
out
|
Optional[List[Union[ndarray, Tensor]]]
|
An optional list or tensor to store the output tensors.
If provided, it must contain pre-allocated tensors
with shapes matching the network's output shapes.
If not provided or set to |
None
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the Runtime object's initialization. |
forward(*input_args, out=None, **input_kwargs)
¶
An alias for the run
method.
This method is provided for compatibility with PyTorch's naming convention.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
out
|
Optional[List[Union[ndarray, Tensor]]]
|
An optional list or tensor to store the output tensors.
If provided, it must contain pre-allocated tensors
with shapes matching the network's output shapes.
If not provided or set to |
None
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network, as returned by the |
__call__(*input_args, out=None, **input_kwargs)
¶
Allows the Runtime object to be called as a function.
This method is provided for convenience and compatibility with common neural network frameworks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
out
|
Optional[List[Union[ndarray, Tensor]]]
|
An optional list or tensor to store the output tensors.
If provided, it must contain pre-allocated tensors
with shapes matching the network's output shapes.
If not provided or set to |
None
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network, as returned by the |
AsyncRuntime
¶
An AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.
Functions¶
__init__(compiled_model, *, device=0, tensor_type='np', path=None, parallel=None)
¶
Initializes an AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
compiled_model
|
Union[str, RBLNCompiledModel]
|
The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel. |
required |
device
|
int
|
The device ID of the NPU to use for execution. Defaults to 0. |
0
|
tensor_type
|
str
|
The object type of the tensor used in the
Defaults to "np". |
'np'
|
path
|
str
|
Deprecated. Use 'compiled_model' instead. |
None
|
parallel
|
int
|
The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are:
This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature. Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit. |
None
|
run(*input_args, **input_kwargs)
¶
Runs the compiled neural network asynchronously with the given input tensors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
AsyncTask
|
An asynchronous task object representing the neural network execution. The task object can be used to wait for the neural network execution to finish. |
async_run(*input_args, **input_kwargs)
async
¶
Runs the compiled neural network asynchronously and returns the result awaitably.
This method is a coroutine that can be used with the await
keyword to asynchronously run
the neural network and retrieve the result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network. The return type depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization. |
AsyncTask
¶
Functions¶
wait(timeout=None)
¶
Waits for the asynchronous task to complete and returns the result.
This method blocks the calling thread until the task is completed or the specified timeout is reached.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout
|
Optional[float]
|
The maximum amount of time (in seconds) to wait for the task to complete. If None, the method will wait indefinitely until the task is completed. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization. |
utility API
¶
Functions¶
npu_is_available(device=0)
¶
Return a bool indicating whether the RBLN device is currently available.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
int
|
Index of the npu. Defaults to 0. |
0
|
Returns:
Type | Description |
---|---|
bool
|
A bool indicating whether the RBLN device is currently available |
get_npu_name(device=0)
¶
Return the name of RBLN npu.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
int
|
Index of the npu. Defaults to 0. |
0
|
Returns:
Type | Description |
---|---|
str
|
Corresponding name of the npu. e.g. "RBLN-CA12" |