파이썬 API¶
파이썬 API 문서는 명확한 이해를 위해 영문으로 작성되어 있습니다.
RBLN 컴파일 API
¶
Classes¶
RBLNCompiledModel
¶
A class that holds the compile binaries.
This class is generated by compile_from_*
functions.
Functions¶
save(path)
¶
Serialize and save the model to the disk, .rbln
formatted file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
PathLike
|
Path to save serialized data |
required |
create_runtime(*, device=0, tensor_type='np')
¶
Create runtime with this binary.
Note that this function is exclusive to create_async_runtime
.
Once you create a runtime by create_runtime
with the instance, you can't call create_async_runtime
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
int
|
The device ID of the NPU to use for execution. Defaults to 0. |
0
|
tensor_type
|
str
|
The object type of the tensor used in the
Defaults to "np". |
'np'
|
Returns:
Type | Description |
---|---|
Runtime
|
Runtime object that can be run on the RBLN ATOM |
create_async_runtime(*, device=0, tensor_type='np', parallel=None)
¶
Create asynchronous version of runtime with this binary.
Note that this function is exclusive to create_runtime
.
Once you create an asynchronous runtime by create_async_runtime
with the instance, you can't call create_runtime
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
int
|
The device ID of the NPU to use for execution. Defaults to 0. |
0
|
tensor_type
|
str
|
The object type of the tensor used in the
Defaults to "np". |
'np'
|
parallel
|
int
|
The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are:
This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature. Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit. |
None
|
Returns:
Type | Description |
---|---|
AsyncRuntime
|
Asynchronous runtime object that can be run on the RBLN ATOM |
get_total_device_alloc(parallel=1)
¶
Retrieves the total device memory allocation (in bytes) required for the compiled graph across all NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parallel
|
int
|
If parallel > 1, the returned allocation size accounts for additional buffer requirements when operating in non-blocking (asynchronous) mode. Defaults to 1. |
1
|
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The total device memory allocation (in bytes) required for the compiled graph. |
Functions¶
compile_from_torch(mod, input_info=None, example_inputs=None, *, npu=None, tensor_parallel_size=None)
¶
Compile a model from torch.nn.Module
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mod
|
Module
|
A pytorch function |
required |
input_info
|
List[Tuple[str, List[int], DType]]
|
A list of input information, with each information described in a triple format (name, shape, dtype).
|
None
|
example_inputs
|
List[Tensor]
|
A list of example input torch tensors that can be used for tracing. If None, tracing may use default dummy inputs obtained from input_info. |
None
|
npu
|
str
|
The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None. |
None
|
tensor_parallel_size
|
int
|
The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
RBLNCompiledModel
|
Compiled model that can be run on the RBLN NPU |
compile_from_torchscript(mod, input_names=None, *, npu=None, tensor_parallel_size=None)
¶
Compile a model from torch.jit.ScriptModule
, a result of torch.jit.trace
function.
Note that the input shape & dtype informations should be kept to compile torchscript model.
When loading a torchscript model via torch.jit.load
, these information are skipped from
recover by default. You should explicitly add additional parameter _restore_shapes=True
to recover these informations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mod
|
TorchScript
|
A pytorch jit-traced model. |
required |
input_names
|
Optional[List[Optional[str]]]
|
A list of input names.
If |
None
|
npu
|
str
|
The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None. |
None
|
tensor_parallel_size
|
int
|
The number of NPU devices to use for tensor parallelism. If None, tensor parallelism will not be used. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
RBLNCompiledModel
|
Compiled model that can be run on the RBLN NPU |
compile_from_tf_function(func, input_info, outputs=None, layout='NHWC', *, npu=None)
¶
Compile a model from tf.function
.
Note that the input function should not be concretized by get_concrete_function
method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
GenericFunction
|
A tensorflow function |
required |
input_info
|
List[Tuple[str, List[int], DType]]
|
A list of input information,
with each information described in triple format (name, shape, dtype).
If the dtype is specified as |
required |
outputs
|
Optional[Union[str, List[str]]]
|
A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs. |
None
|
layout
|
str
|
Layout of the tensor used internally in the model. One of "NHWC" or "NCHW" |
'NHWC'
|
npu
|
str
|
The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
RBLNCompiledModel
|
Compiled model that can be run on the RBLN NPU |
compile_from_tf_graph_def(graph_def, outputs=None, layout='NHWC', *, npu=None)
¶
Compile a model from TensorFlow GraphDef
.
This function allows you to compile TensorFlow V1.x legacy models.
If you are using TensorFlow V2 as a default, we recommend compiling the model
using compile_from_tf_function
in its function form.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph_def
|
GraphDef
|
A tensorflow graph definition in the form of a protocol buffer |
required |
outputs
|
Optional[Union[str, List[str]]]
|
A string or list of the name of output node(s) (Optional). If not specified, then the last node is assumed to be the graph output. This may be useful when the graph has multiple outputs. |
None
|
layout
|
str
|
Layout of the tensor used internally in the model. One of "NHWC" or "NCHW" |
'NHWC'
|
npu
|
str
|
The identifier of the target NPU for compilation. If None, the function will use the NPU installed on the host machine. If an NPU is not installed on the host machine, an error will be raised. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
RBLNCompiledModel
|
Compiled model that can be run on the RBLN NPU |
파이토치 컴파일 API
¶
Functions¶
compile(model=None, *, dynamic=None, backend='inductor', options=None)
¶
Optimizes the given model/function using TorchDynamo with the RBLN backend for execution on RBLN hardware.
This function compiles the input model to run efficiently on RBLN NPUs. It leverages TorchDynamo for tracing and the RBLN backend for generating optimized code tailored to RBLN hardware. To use this backend, ensure that the RBLN SDK is imported before calling torch.compile.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Callable
|
Module/function to optimize |
None
|
dynamic
|
bool or None
|
Use dynamic shape tracing. The RBLN backend currently does not support dynamic shapes. |
None
|
backend
|
str or Callable
|
backend to be used
|
'inductor'
|
options
|
dict
|
A dictionary of options to pass to the backend. Some notable ones to try out for rbln backend are
|
None
|
Example:
1 2 3 4 |
|
런타임 API
¶
Classes¶
RuntimeBase
¶
A base class for runtime, providing common functionalities for both synchronous and asynchronous runtime.
The RuntimeBase
class serves as a foundational class for the Runtime
and
AsyncRuntime
classes. It encapsulates shared methods and attributes
that manage the execution of the model.
Functions¶
model_description()
¶
Returns a description of the model currently loaded in the runtime.
This method provides a summary of the model's architecture, including details about its inputs, outputs, and memory usage on the RBLN device.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
A string containing the model's description. |
Runtime
¶
A Runtime object for executing a compiled neural network on an NPU.
Functions¶
__init__(compiled_model, *, device=0, tensor_type='np', path=None)
¶
Initializes a Runtime object for executing a compiled neural network on an NPU.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
compiled_model
|
Union[str, RBLNCompiledModel]
|
The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel. |
required |
device
|
int
|
The device ID of the NPU to use for execution. Defaults to 0. |
0
|
tensor_type
|
str
|
The object type of the tensor used in the
Defaults to "np". |
'np'
|
path
|
str
|
Deprecated. Use 'compiled_model' instead. |
None
|
run(*input_args, out=None, **input_kwargs)
¶
Runs the compiled neural network with the given input tensors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
out
|
Optional[List[Union[ndarray, Tensor]]]
|
An optional list or tensor to store the output tensors.
If provided, it must contain pre-allocated tensors
with shapes matching the network's output shapes.
If not provided or set to |
None
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the Runtime object's initialization. |
forward(*input_args, out=None, **input_kwargs)
¶
An alias for the run
method.
This method is provided for compatibility with PyTorch's naming convention.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
out
|
Optional[List[Union[ndarray, Tensor]]]
|
An optional list or tensor to store the output tensors.
If provided, it must contain pre-allocated tensors
with shapes matching the network's output shapes.
If not provided or set to |
None
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network, as returned by the |
__call__(*input_args, out=None, **input_kwargs)
¶
Allows the Runtime object to be called as a function.
This method is provided for convenience and compatibility with common neural network frameworks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
out
|
Optional[List[Union[ndarray, Tensor]]]
|
An optional list or tensor to store the output tensors.
If provided, it must contain pre-allocated tensors
with shapes matching the network's output shapes.
If not provided or set to |
None
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network, as returned by the |
AsyncRuntime
¶
An AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.
Functions¶
__init__(compiled_model, *, device=0, tensor_type='np', path=None, parallel=None)
¶
Initializes an AsyncRuntime object for executing a compiled neural network asynchronously on an NPU.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
compiled_model
|
Union[str, RBLNCompiledModel]
|
The path to the compiled rbln neural network file (*.rbln) or an instance of RBLNCompiledModel. |
required |
device
|
int
|
The device ID of the NPU to use for execution. Defaults to 0. |
0
|
tensor_type
|
str
|
The object type of the tensor used in the
Defaults to "np". |
'np'
|
path
|
str
|
Deprecated. Use 'compiled_model' instead. |
None
|
parallel
|
int
|
The number of threads used to prepare and queue input data for the NPU's input buffer. The NPU supports double buffering, allowing it to process one input while the next is being prepared. Possible values are:
This parameter helps manage the NPU's input buffer more efficiently, ensuring that the next input is ready as soon as the current computation finishes. Only applicable to AsyncRuntime due to its non-blocking nature. Note: Use parallel = 2 with caution, as it may not work on some models (e.g. LLM). It's recommended to benchmark your specific use case to determine if it provides a performance benefit. |
None
|
run(*input_args, **input_kwargs)
¶
Runs the compiled neural network asynchronously with the given input tensors.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
AsyncTask
|
An asynchronous task object representing the neural network execution. The task object can be used to wait for the neural network execution to finish. |
async_run(*input_args, **input_kwargs)
async
¶
Runs the compiled neural network asynchronously and returns the result awaitably.
This method is a coroutine that can be used with the await
keyword to asynchronously run
the neural network and retrieve the result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*input_args
|
Union[ndarray, Tensor]
|
Variable length argument list of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
()
|
**input_kwargs
|
Union[ndarray, Tensor]
|
Arbitrary keyword arguments of input tensors. Each argument should be either a numpy.ndarray or a torch.Tensor. |
{}
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network. The return type depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization. |
AsyncTask
¶
Functions¶
wait(timeout=None)
¶
Waits for the asynchronous task to complete and returns the result.
This method blocks the calling thread until the task is completed or the specified timeout is reached.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout
|
Optional[float]
|
The maximum amount of time (in seconds) to wait for the task to complete. If None, the method will wait indefinitely until the task is completed. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
Union[ndarray, Tensor, List[Union[ndarray, Tensor]]]
|
The output tensor(s) of the neural network. The return depends on the network's architecture and can be either a single tensor or a list of tensors. The tensor type (numpy.ndarray or torch.Tensor) is determined by the tensor_type provided during the AsyncRuntime object's initialization. |
유틸리티 API
¶
Functions¶
npu_is_available(device=0)
¶
Return a bool indicating whether the RBLN device is currently available.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
int
|
Index of the npu. Defaults to 0. |
0
|
Returns:
Type | Description |
---|---|
bool
|
A bool indicating whether the RBLN device is currently available |
get_npu_name(device=0)
¶
Return the name of RBLN npu.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
int
|
Index of the npu. Defaults to 0. |
0
|
Returns:
Type | Description |
---|---|
str
|
Corresponding name of the npu. e.g. "RBLN-CA12" |