Python API¶

`RBLN compile API`¶

Classes¶

`RBLNCompiledModel` ¶

A class that holds the compile binaries. This class is generated by compile_from_* functions.

Functions¶

`save(path)` ¶

Serialize and save the model to the disk, .rbln formatted file.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to save serialized data	required

`create_runtime(*, device=0, tensor_type='np')` ¶

Create runtime with this binary. Note that this function is exclusive to create_async_runtime. Once you create a runtime by create_runtime with the instance, you can't call create_async_runtime.

Parameters:

Name	Type	Description	Default
`device`	`int`	The device ID of the NPU to use for execution. Defaults to 0.	`0`
`tensor_type`	`str`	The object type of the tensor used in the `run` function. Possible values are: "np": Uses np.ndarray type. "pt": Uses torch.Tensor type. Defaults to "np".	`'np'`

Returns:

Type	Description
`Runtime`	Runtime object that can be run on the RBLN ATOM

`create_async_runtime(*, device=0, tensor_type='np', parallel=None)` ¶

Create asynchronous version of runtime with this binary. Note that this function is exclusive to create_runtime. Once you create an asynchronous runtime by create_async_runtime with the instance, you can't call create_runtime.

Parameters:

Name	Type	Description	Default
`device`	`int`	The device ID of the NPU to use for execution. Defaults to 0.	`0`
`tensor_type`	`str`	The object type of the tensor used in the `run` function. Possible values are: "np": Uses np.ndarray type. "pt": Uses torch.Tensor type. Defaults to "np".	`'np'`
`parallel`	`int`	The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are: 1: Uses a single thread to prepare inputs (default). 2: Uses two threads to prepare inputs, enabling double buffering. This can potentially improve performance when input preparation is time-consuming, as one thread can prepare the next input while the NPU is still processing the current one. This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature. Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit.	`None`

Returns:

Type	Description
`AsyncRuntime`	Asynchronous runtime object that can be run on the RBLN ATOM

`get_total_device_alloc(parallel=1)` ¶

Retrieves the total device memory allocation (in bytes) required for the compiled graph across all NPUs.

Parameters:

Name	Type	Description	Default
`parallel`	`int`	If parallel > 1, the returned allocation size accounts for additional buffer requirements when operating in non-blocking (asynchronous) mode. Defaults to 1.	`1`

Returns:

Name	Type	Description
`int`	`int`	The total device memory allocation (in bytes) required for the compiled graph.

Functions¶

`compile_from_torch(mod, input_info=None, example_inputs=None, *, npu=None, tensor_parallel_size=None)` ¶

Compile a model from torch.nn.Module.

Parameters:

Name	Type	Description	Default
`mod`	`Module`	A pytorch function	required
`input_info`	`List[Tuple[str, List[int], DType]]`	A list of input information, with each information described in a triple format (name, shape, dtype). name : `str` shape : List[int] dtype : `str` or `torch.dtype` (i.e., "float32" or torch.float32)	`None`
`example_inputs`	`List[Tensor]`	A list of example input torch tensors that can be used for tracing. If None, tracing may use default dummy inputs obtained from input_info.	`None`
`npu`	`str`	The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None.	`None`
`tensor_parallel_size`	`int`	The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None.	`None`

Returns:

Type	Description
`RBLNCompiledModel`	Compiled model that can be run on the RBLN NPU

`compile_from_torchscript(mod, input_names=None, *, npu=None, tensor_parallel_size=None)` ¶

Compile a model from torch.jit.ScriptModule, a result of torch.jit.trace function.

Note that the input shape & dtype informations should be kept to compile torchscript model. When loading a torchscript model via torch.jit.load, these information are skipped from recover by default. You should explicitly add additional parameter _restore_shapes=True to recover these informations.

mod = torch.jit.load("model.pt", _restore_shapes=True)
compile_from_torchscript(mod)

Parameters:

Name	Type	Description	Default
`mod`	`TorchScript`	A pytorch jit-traced model.	required
`input_names`	`Optional[List[Optional[str]]]`	A list of input names. If `input_names` is specified as `None`, it'll derive name from `mod`, name of corresponding input in `forward` function.	`None`
`npu`	`str`	The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None.	`None`
`tensor_parallel_size`	`int`	The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None.	`None`

Returns:

Type	Description
`RBLNCompiledModel`	Compiled model that can be run on the RBLN NPU

`compile_from_tf_function(func, input_info, outputs=None, layout='NHWC', *, npu=None)` ¶

Compile a model from tf.function. Note that the input function should not be concretized by get_concrete_function method.

Parameters:

Name	Type	Description	Default
`func`	`GenericFunction`	A tensorflow function	required
`input_info`	`List[Tuple[str, List[int], DType]]`	A list of input information, with each information described in triple format (name, shape, dtype). If the dtype is specified as `None`, it'll derive the dtype from `dtype` parameter.	required
`outputs`	`Optional[Union[str, List[str]]]`	A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs.	`None`
`layout`	`str`	Layout of the tensor used internally in the model. One of "NHWC" or "NCHW"	`'NHWC'`
`npu`	`str`	The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None.	`None`

Returns:

Type	Description
`RBLNCompiledModel`	Compiled model that can be run on the RBLN NPU

`compile_from_tf_graph_def(graph_def, outputs=None, layout='NHWC', *, npu=None)` ¶

Compile a model from TensorFlow GraphDef. This function allows you to compile TensorFlow V1.x legacy models. If you are using TensorFlow V2 as a default, we recommend compiling the model using compile_from_tf_function in its function form.

Parameters:

Name	Type	Description	Default
`graph_def`	`GraphDef`	A tensorflow graph definition in the form of a protocol buffer	required
`outputs`	`Optional[Union[str, List[str]]]`	A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs.	`None`
`layout`	`str`	Layout of the tensor used internally in the model. One of "NHWC" or "NCHW"	`'NHWC'`
`npu`	`str`	The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None.	`None`

Returns:

Type	Description
`RBLNCompiledModel`	Compiled model that can be run on the RBLN NPU

`torch.compile API`¶

Functions¶

`compile(model=None, *, dynamic=None, backend='inductor', options=None)` ¶

Optimizes the given model/function using TorchDynamo with the RBLN backend for execution on RBLN hardware.

This function compiles the input model to run efficiently on RBLN NPUs. It leverages TorchDynamo for tracing and the RBLN backend for generating optimized code tailored to RBLN hardware. To use this backend, ensure that the RBLN SDK is imported before calling torch.compile.

Parameters:

Name	Type	Description	Default
`model`	`Callable`	Module/function to optimize	`None`
`dynamic`	`bool or None`	Use dynamic shape tracing. The RBLN backend currently does not support dynamic shapes.	`None`
`backend`	`str or Callable`	backend to be used To use the RBLN backend, set this to `"rbln"` and ensure `import rebel` is executed before calling `torch.compile`.	`'inductor'`
`options`	`dict`	A dictionary of options to pass to the backend. Some notable ones to try out for rbln backend are `cache_dir` which specifies the directory where compiled artifacts should be stored. `mode` which specifies the mode of compilation. If set to 'strict,' the model is compiled in strict mode, ensuring that it is fully exported as required. If None, the compilation failure automatically triggers eager mode. `npu` which specifies the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If no NPU is installed on the host machine, an error will be raised. Defaults to None. `device` which specifies the device ID of the NPU to use for execution. Defaults to 0.	`None`

Example:

compiled_model = torch.compile(model,
                               backend="rbln",  # Specify the RBLN backend
                               options={"cache_dir": "./rbln_cache_dir"},  # Cache directory for compiled artifacts
                               dynamic=False)  # Disable dynamic shapes (not supported by RBLN backend)

`runtime API`¶

Classes¶

`RuntimeBase` ¶

A base class for runtime, providing common functionalities for both synchronous and asynchronous runtime.

The RuntimeBase class serves as a foundational class for the Runtime and AsyncRuntime classes. It encapsulates shared methods and attributes that manage the execution of the model.

Functions¶

`model_description()` ¶

Returns a description of the model currently loaded in the runtime.

This method provides a summary of the model's architecture, including details about its inputs, outputs, and memory usage on the RBLN device.

Returns:

Name	Type	Description
`str`	`str`	A string containing the model's description.

`Runtime` ¶

A Runtime object for executing a compiled neural network on an NPU.

Functions¶

`init(compiled_model, *, device=0, tensor_type='np', path=None, activate_profiler=False)` ¶

Initializes a Runtime object for executing a compiled neural network on an NPU.

Parameters:

Name	Type	Description	Default
`compiled_model`	`Union[str, RBLNCompiledModel]`	The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel.	required
`device`	`int`	The device ID of the NPU to use for execution. Defaults to 0.	`0`
`tensor_type`	`str`	The object type of the tensor used in the `run` function. Possible values are: "np": Uses np.ndarray type. "pt": Uses torch.Tensor type. Defaults to "np".	`'np'`
`path`	`str`	Deprecated. Use 'compiled_model' instead.	`None`
`activate_profiler`	`bool`	Whether to activate profiling for this runtime instance. If set to `True`, profiling information (e.g., execution times) will be collected during execution. Useful for debugging and performance tuning. Defaults to `False`.	`False`

`run(*input_args, out=None, **input_kwargs)` ¶

Runs the compiled neural network with the given input tensors.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`out`	`Optional[List[Union[ndarray, Tensor]]]`	An optional list or tensor to store the output tensors. If provided, it must contain pre-allocated tensors with shapes matching the network's output shapes. If not provided or set to `None`, new tensors will be allocated to store the outputs.	`None`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the Runtime object's initialization.

`forward(*input_args, out=None, **input_kwargs)` ¶

An alias for the run method.

This method is provided for compatibility with PyTorch's naming convention.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`out`	`Optional[List[Union[ndarray, Tensor]]]`	An optional list or tensor to store the output tensors. If provided, it must contain pre-allocated tensors with shapes matching the network's output shapes. If not provided or set to `None`, new tensors will be allocated to store the outputs.	`None`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network, as returned by the `run` method.

`call(*input_args, out=None, **input_kwargs)` ¶

Allows the Runtime object to be called as a function.

This method is provided for convenience and compatibility with common neural network frameworks.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`out`	`Optional[List[Union[ndarray, Tensor]]]`	An optional list or tensor to store the output tensors. If provided, it must contain pre-allocated tensors with shapes matching the network's output shapes. If not provided or set to `None`, new tensors will be allocated to store the outputs.	`None`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network, as returned by the `run` method.

`AsyncRuntime` ¶

An AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.

Functions¶

`init(compiled_model, *, device=0, tensor_type='np', path=None, parallel=None)` ¶

Initializes an AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.

Parameters:

Name	Type	Description	Default
`compiled_model`	`Union[str, RBLNCompiledModel]`	The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel.	required
`device`	`int`	The device ID of the NPU to use for execution. Defaults to 0.	`0`
`tensor_type`	`str`	The object type of the tensor used in the `run` function. Possible values are: "np": Uses np.ndarray type. "pt": Uses torch.Tensor type. Defaults to "np".	`'np'`
`path`	`str`	Deprecated. Use 'compiled_model' instead.	`None`
`parallel`	`int`	The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are: 1: Uses a single thread to prepare inputs (default). 2: Uses two threads to prepare inputs, enabling double buffering. This can potentially improve performance when input preparation is time-consuming, as one thread can prepare the next input while the NPU is still processing the current one. This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature. Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit.	`None`

`run(*input_args, **input_kwargs)` ¶

Runs the compiled neural network asynchronously with the given input tensors.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`AsyncTask`	An asynchronous task object representing the neural network execution. The task object can be used to wait for the neural network execution to finish.

`async_run(*input_args, **input_kwargs)` `async` ¶

Runs the compiled neural network asynchronously and returns the result awaitably.

This method is a coroutine that can be used with the await keyword to asynchronously run the neural network and retrieve the result.

Parameters:

Name	Type	Description	Default
`*input_args`	`Union[ndarray, Tensor]`	Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`()`
`**input_kwargs`	`Union[ndarray, Tensor]`	Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.	`{}`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network. The return type depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization.

`AsyncTask` ¶

Functions¶

`wait(timeout=None)` ¶

Waits for the asynchronous task to complete and returns the result.

This method blocks the calling thread until the task is completed or the specified timeout is reached.

Parameters:

Name	Type	Description	Default
`timeout`	`Optional[float]`	The maximum amount of time (in seconds) to wait for the task to complete. If None, the method will wait indefinitely until the task is completed. Defaults to None.	`None`

Returns:

Type	Description
`Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]`	The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization.

`utility API`¶

Functions¶

`npu_is_available(device=0)` ¶

Return a bool indicating whether the RBLN device is currently available.

Parameters:

Name	Type	Description	Default
`device`	`int`	Index of the npu. Defaults to 0.	`0`

Returns:

Type	Description
`bool`	A bool indicating whether the RBLN device is currently available

`get_npu_name(device=0)` ¶

Return the name of RBLN npu.

Parameters:

Name	Type	Description	Default
`device`	`int`	Index of the npu. Defaults to 0.	`0`

Returns:

Type	Description
`str`	Corresponding name of the npu. e.g. "RBLN-CA12"

Python API¶

RBLN compile API¶

Classes¶

RBLNCompiledModel ¶

Functions¶

save(path) ¶

create_runtime(*, device=0, tensor_type='np') ¶

create_async_runtime(*, device=0, tensor_type='np', parallel=None) ¶

get_total_device_alloc(parallel=1) ¶

Functions¶

compile_from_torch(mod, input_info=None, example_inputs=None, *, npu=None, tensor_parallel_size=None) ¶

compile_from_torchscript(mod, input_names=None, *, npu=None, tensor_parallel_size=None) ¶

compile_from_tf_function(func, input_info, outputs=None, layout='NHWC', *, npu=None) ¶

compile_from_tf_graph_def(graph_def, outputs=None, layout='NHWC', *, npu=None) ¶

torch.compile API¶

Functions¶

compile(model=None, *, dynamic=None, backend='inductor', options=None) ¶

runtime API¶

Classes¶

RuntimeBase ¶

Functions¶

model_description() ¶

Runtime ¶

Functions¶

__init__(compiled_model, *, device=0, tensor_type='np', path=None, activate_profiler=False) ¶

run(*input_args, out=None, **input_kwargs) ¶

forward(*input_args, out=None, **input_kwargs) ¶

__call__(*input_args, out=None, **input_kwargs) ¶

AsyncRuntime ¶

Functions¶

__init__(compiled_model, *, device=0, tensor_type='np', path=None, parallel=None) ¶

run(*input_args, **input_kwargs) ¶

async_run(*input_args, **input_kwargs) async ¶

AsyncTask ¶

Functions¶

wait(timeout=None) ¶

utility API¶

Functions¶

npu_is_available(device=0) ¶

get_npu_name(device=0) ¶

`RBLN compile API`¶

`RBLNCompiledModel` ¶

`save(path)` ¶

`create_runtime(*, device=0, tensor_type='np')` ¶

`create_async_runtime(*, device=0, tensor_type='np', parallel=None)` ¶

`get_total_device_alloc(parallel=1)` ¶

`compile_from_torch(mod, input_info=None, example_inputs=None, *, npu=None, tensor_parallel_size=None)` ¶

`compile_from_torchscript(mod, input_names=None, *, npu=None, tensor_parallel_size=None)` ¶

`compile_from_tf_function(func, input_info, outputs=None, layout='NHWC', *, npu=None)` ¶

`compile_from_tf_graph_def(graph_def, outputs=None, layout='NHWC', *, npu=None)` ¶

`torch.compile API`¶

`compile(model=None, *, dynamic=None, backend='inductor', options=None)` ¶

`runtime API`¶

`RuntimeBase` ¶

`model_description()` ¶

`Runtime` ¶

`init(compiled_model, *, device=0, tensor_type='np', path=None, activate_profiler=False)` ¶

`run(*input_args, out=None, **input_kwargs)` ¶

`forward(*input_args, out=None, **input_kwargs)` ¶

`call(*input_args, out=None, **input_kwargs)` ¶

`AsyncRuntime` ¶

`init(compiled_model, *, device=0, tensor_type='np', path=None, parallel=None)` ¶

`run(*input_args, **input_kwargs)` ¶

`async_run(*input_args, **input_kwargs)` `async` ¶

`AsyncTask` ¶

`wait(timeout=None)` ¶

`utility API`¶

`npu_is_available(device=0)` ¶

`get_npu_name(device=0)` ¶