Skip to content

Python API

RBLN compile API

Classes

RBLNCompiledModel

A class that holds the compile binaries. This class is generated by compile_from_* functions.

Functions
save(path)

Serialize and save the model to the disk, .rbln formatted file.

Parameters:

Name Type Description Default
path PathLike

Path to save serialized data

required
create_runtime(*, device=0, tensor_type='np')

Create runtime with this binaries. Note that this function is exclusive to create_async_runtime. Once you create a runtime by create_runtime with the instance, you can't call create_async_runtime.

Parameters:

Name Type Description Default
device int

The device ID of the NPU to use for execution. Defaults to 0.

0
tensor_type str

The object type of the tensor used in the run function. Possible values are:

  • "np": Uses np.ndarray type.
  • "pt": Uses torch.Tensor type.

Defaults to "np".

'np'

Returns:

Type Description
Runtime

Runtime object that can be run on the RBLN ATOM

create_async_runtime(*, device=0, tensor_type='np', parallel=None)

Create asynchronous version of runtime with this binaries. Note that this function is exclusive to create_runtime. Once you create an asynchronous runtime by create_async_runtime with the instance, you can't call create_runtime.

Parameters:

Name Type Description Default
device int

The device ID of the NPU to use for execution. Defaults to 0.

0
tensor_type str

The object type of the tensor used in the run function. Possible values are:

  • "np": Uses np.ndarray type.
  • "pt": Uses torch.Tensor type.

Defaults to "np".

'np'
parallel int

The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are:

  • 1: Uses a single thread to prepare inputs (default).
  • 2: Uses two threads to prepare inputs, enabling double buffering. This can potentially improve performance when input preparation is time-consuming, as one thread can prepare the next input while the NPU is still processing the current one.

This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature.

Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit.

None

Returns:

Type Description
AsyncRuntime

Asynchronous runtime object that can be run on the RBLN ATOM

Functions

compile_from_torch(mod, input_info=None, example_inputs=None, *, npu=None, tensor_parallel_size=None)

Compile a model from torch.nn.Module.

Parameters:

Name Type Description Default
mod Module

A pytorch function

required
input_info List[Tuple[str, List[int], DType]]

A list of input information, with each information described in a triple format (name, shape, dtype).

  • name : str
  • shape : List[int]
  • dtype : str or torch.dtype (i.e., "float32" or torch.float32)
None
example_inputs List[Tensor]

A list of example input torch tensors that can be used for tracing. If None, tracing may use default dummy inputs obtained from input_info.

None
npu str

The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If no NPU is installed on the host machine, an error will be raised. Defaults to None.

None
tensor_parallel_size int

The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None.

None

Returns:

Type Description
RBLNCompiledModel

Compiled model that can be run on the RBLN NPU

compile_from_torchscript(mod, input_names=None, *, npu=None, tensor_parallel_size=None)

Compile a model from torch.jit.ScriptModule, a result of torch.jit.trace function.

Note that the input shape & dtype informations should be kept to compile torchscript model. When loading a torchscript model via torch.jit.load, these information are skipped from recover by default. You should explicitly add additional parameter _restore_shapes=True to recover these informations.

mod = torch.jit.load("model.pt", _restore_shapes=True)
compile_from_torchscript(mod)

Parameters:

Name Type Description Default
mod TorchScript

A pytorch jit-traced model.

required
input_names Optional[List[Optional[str]]]

A list of input names. If input_names is specified as None, it'll derive name from mod, name of corresponding input in forward function.

None
npu str

The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If no NPU is installed on the host machine, an error will be raised. Defaults to None.

None
tensor_parallel_size int

The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None.

None

Returns:

Type Description
RBLNCompiledModel

Compiled model that can be run on the RBLN NPU

compile_from_tf_function(func, input_info, outputs=None, layout='NHWC', *, npu=None)

Compile a model from tf.function. Note that the input function should not be concretized by get_concrete_function method.

Parameters:

Name Type Description Default
func GenericFunction

A tensorflow function

required
input_info List[Tuple[str, List[int], DType]]

A list of input information, with each information described in triple format (name, shape, dtype). If the dtype is specified as None, it'll derive the dtype from dtype parameter.

required
outputs Optional[Union[str, List[str]]]

A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs.

None
layout str

Layout of the tensor used internally in the model. One of "NHWC" or "NCHW"

'NHWC'
npu str

The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If no NPU is installed on the host machine, an error will be raised. Defaults to None.

None

Returns:

Type Description
RBLNCompiledModel

Compiled model that can be run on the RBLN NPU

compile_from_tf_graph_def(graph_def, outputs=None, layout='NHWC', *, npu=None)

Compile a model from TensorFlow GraphDef. This function allows you to compile TensorFlow V1.x legacy models. If you are using TensorFlow V2 as a default, we recommend compiling the model using compile_from_tf_function in its function form.

Parameters:

Name Type Description Default
graph_def GraphDef

A tensorflow graph definition in the form of a protocol buffer

required
outputs Optional[Union[str, List[str]]]

A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs.

None
layout str

Layout of the tensor used internally in the model. One of "NHWC" or "NCHW"

'NHWC'
npu str

The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If no NPU is installed on the host machine, an error will be raised. Defaults to None.

None

Returns:

Type Description
RBLNCompiledModel

Compiled model that can be run on the RBLN NPU

torch.compile API

Functions

compile(model=None, *, dynamic=None, backend='inductor', options=None)

Optimizes the given model/function using TorchDynamo with the RBLN backend for execution on RBLN hardware.

This function compiles the input model to run efficiently on RBLN NPUs. It leverages TorchDynamo for tracing and the RBLN backend for generating optimized code tailored to RBLN hardware. To use this backend, ensure that the RBLN SDK is imported before calling torch.compile.

Parameters:

Name Type Description Default
model Callable

Module/function to optimize

None
dynamic bool or None

Use dynamic shape tracing. The RBLN backend currently does not support dynamic shapes.

None
backend str or Callable

backend to be used

  • To use the RBLN backend, set this to "rbln" and ensure import rebel is executed before calling torch.compile.
'inductor'
options dict

A dictionary of options to pass to the backend. Some notable ones to try out for rbln backend are

  • cache_dir which specifies the directory where compiled artifacts should be stored.

  • npu which specifies the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If no NPU is installed on the host machine, an error will be raised. Defaults to None.

  • device which specifies the device ID of the NPU to use for execution. Defaults to 0.

None

Example:

1
2
3
4
compiled_model = torch.compile(model,
                               backend="rbln",  # Specify the RBLN backend
                               options={"cache_dir": "./rbln_cache_dir"},  # Cache directory for compiled artifacts
                               dynamic=False)  # Disable dynamic shapes (not supported by RBLN backend)

runtime API

Classes

RuntimeBase

A base class for runtime, providing common functionalities for both synchronous and asynchronous runtime.

The RuntimeBase class serves as a foundational class for the Runtime and AsyncRuntime classes. It encapsulates shared methods and attributes that manage the execution of the model.

Functions
model_description()

Returns a description of the model currently loaded in the runtime.

This method provides a summary of the model's architecture, including details about its inputs, outputs, and memory usage on the RBLN device.

Returns:

Name Type Description
str str

A string containing the model's description.

Runtime

A Runtime object for executing a compiled neural network on an NPU.

Functions
__init__(compiled_model, *, device=0, tensor_type='np', path=None)

Initializes a Runtime object for executing a compiled neural network on an NPU.

Parameters:

Name Type Description Default
compiled_model Union[str, RBLNCompiledModel]

The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel.

required
device int

The device ID of the NPU to use for execution. Defaults to 0.

0
tensor_type str

The object type of the tensor used in the run function. Possible values are:

  • "np": Uses np.ndarray type.
  • "pt": Uses torch.Tensor type.

Defaults to "np".

'np'
path str

Deprecated. Use 'compiled_model' instead.

None
run(*input_args, out=None, **input_kwargs)

Runs the compiled neural network with the given input tensors.

Parameters:

Name Type Description Default
*input_args Union[ndarray, Tensor]

Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

()
out Optional[List[Union[ndarray, Tensor]]]

An optional list or tensor to store the output tensors. If provided, it must contain pre-allocated tensors with shapes matching the network's output shapes. If not provided or set to None, new tensors will be allocated to store the outputs.

None
**input_kwargs Union[ndarray, Tensor]

Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

{}

Returns:

Type Description
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]

The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the Runtime object's initialization.

forward(*input_args, out=None, **input_kwargs)

An alias for the run method.

This method is provided for compatibility with PyTorch's naming convention.

Parameters:

Name Type Description Default
*input_args Union[ndarray, Tensor]

Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

()
out Optional[List[Union[ndarray, Tensor]]]

An optional list or tensor to store the output tensors. If provided, it must contain pre-allocated tensors with shapes matching the network's output shapes. If not provided or set to None, new tensors will be allocated to store the outputs.

None
**input_kwargs Union[ndarray, Tensor]

Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

{}

Returns:

Type Description
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]

The output tensor(s) of the neural network, as returned by the run method.

__call__(*input_args, out=None, **input_kwargs)

Allows the Runtime object to be called as a function.

This method is provided for convenience and compatibility with common neural network frameworks.

Parameters:

Name Type Description Default
*input_args Union[ndarray, Tensor]

Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

()
out Optional[List[Union[ndarray, Tensor]]]

An optional list or tensor to store the output tensors. If provided, it must contain pre-allocated tensors with shapes matching the network's output shapes. If not provided or set to None, new tensors will be allocated to store the outputs.

None
**input_kwargs Union[ndarray, Tensor]

Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

{}

Returns:

Type Description
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]

The output tensor(s) of the neural network, as returned by the run method.

AsyncRuntime

An AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.

Functions
__init__(compiled_model, *, device=0, tensor_type='np', path=None, parallel=None)

Initializes an AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.

Parameters:

Name Type Description Default
compiled_model Union[str, RBLNCompiledModel]

The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel.

required
device int

The device ID of the NPU to use for execution. Defaults to 0.

0
tensor_type str

The object type of the tensor used in the run function. Possible values are:

  • "np": Uses np.ndarray type.
  • "pt": Uses torch.Tensor type.

Defaults to "np".

'np'
path str

Deprecated. Use 'compiled_model' instead.

None
parallel int

The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are:

  • 1: Uses a single thread to prepare inputs (default).
  • 2: Uses two threads to prepare inputs, enabling double buffering. This can potentially improve performance when input preparation is time-consuming, as one thread can prepare the next input while the NPU is still processing the current one.

This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature.

Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit.

None
run(*input_args, **input_kwargs)

Runs the compiled neural network asynchronously with the given input tensors.

Parameters:

Name Type Description Default
*input_args Union[ndarray, Tensor]

Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

()
**input_kwargs Union[ndarray, Tensor]

Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

{}

Returns:

Type Description
AsyncTask

An asynchronous task object representing the neural network execution. The task object can be used to wait for the neural network execution to finish.

async_run(*input_args, **input_kwargs) async

Runs the compiled neural network asynchronously and returns the result awaitably.

This method is a coroutine that can be used with the await keyword to asynchronously run the neural network and retrieve the result.

Parameters:

Name Type Description Default
*input_args Union[ndarray, Tensor]

Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

()
**input_kwargs Union[ndarray, Tensor]

Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor.

{}

Returns:

Type Description
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]

The output tensor(s) of the neural network. The return type depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization.

AsyncTask

Functions
wait(timeout=None)

Waits for the asynchronous task to complete and returns the result.

This method blocks the calling thread until the task is completed or the specified timeout is reached.

Parameters:

Name Type Description Default
timeout Optional[float]

The maximum amount of time (in seconds) to wait for the task to complete. If None, the method will wait indefinitely until the task is completed. Defaults to None.

None

Returns:

Type Description
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]

The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization.

utility API

Functions

npu_is_available(device=0)

Return a bool indicating whether the RBLN device is currently available.

Parameters:

Name Type Description Default
device int

Index of the npu. Defaults to 0.

0

Returns:

Type Description
bool

A bool indicating whether the RBLN device is currently available

get_npu_name(device=0)

Return the name of RBLN npu.

Parameters:

Name Type Description Default
device int

Index of the npu. Defaults to 0.

0

Returns:

Type Description
str

Corresponding name of the npu. e.g. "RBLN-CA12"