APIs

PyTorch Native APIs¶

Most of PyTorch native functions can be used on RBLN NPU.

RBLN Specific APIs¶

The following functions are RBLN NPU-specific functions defined in the torch.rbln module.

Classes¶

`device_of(obj)` ¶

Context-manager that changes the current device to that of given object.

You can use both tensors and storages as arguments. If a given object is not allocated on an RBLN device, this is a no-op.

Parameters:

Name	Type	Description	Default
`obj`	`Tensor or Storage`	object allocated on the selected device.	required

Example

tensor = torch.tensor([2, 64], device="rbln:1")
with torch.rbln.device_of(tensor):
    # operations here will use the same device as tensor (rbln:1)
    new_tensor = torch.zeros(3, device="rbln")

Functions¶

`get_amp_supported_dtype()` ¶

Get a list of data types supported by automatic mixed precision (AMP) on RBLN devices.

Returns:

Type	Description
`List[dtype]`	List[torch.dtype]: A list of data types supported by AMP.

Note

This function currently returns only torch.float16.

`is_available()` ¶

Check if any RBLN devices are available.

Returns:

Name	Type	Description
`bool`	`bool`	True if at least one RBLN device is available, False otherwise.

`current_device()` ¶

Get the index of the currently selected RBLN device.

Returns:

Name	Type	Description
`int`	`int`	The index of the currently selected RBLN device.

`device_count()` ¶

Get the number of available RBLN devices.

Returns:

Name	Type	Description
`int`	`int`	The number of available RBLN devices.

`set_device(device)` ¶

Set the current device.

Parameters:

Name	Type	Description	Default
`device`	`device or int or str`	selected device.	required

`empty_cache(device=None)` ¶

Release all unoccupied cached memory currently held by the caching allocator so that those can be used in other application.

Note that only unfragmented (non-split) blocks can be released; fragmented blocks that have been split will remain in the cache until they can be coalesced.

Parameters:

Name	Type	Description	Default
`device`	`int, str, or torch.device`	The device to empty cache for. If None, uses the current device. Defaults to None.	`None`

`memory_allocated(device=None)` ¶

Return the current device memory occupied by tensors in bytes for a given device.

Parameters:

Name	Type	Description	Default
`device`	`int, str, or torch.device`	The device to query. If None, uses the current device. Defaults to None.	`None`

Returns:

Name	Type	Description
`int`	`int`	The current memory occupied by tensors in bytes.

Note

This function reflects device memory only. For information about lazy memory allocation, see :func:memory_stats.

`memory_reserved(device=None)` ¶

Return the current device memory managed by the caching allocator in bytes for a given device.

Parameters:

Name	Type	Description	Default
`device`	`int, str, or torch.device`	The device to query. If None, uses the current device. Defaults to None.	`None`

Returns:

Name	Type	Description
`int`	`int`	The current memory managed by the caching allocator in bytes.

Note

This function reflects device memory only. For information about lazy memory allocation, see :func:memory_stats.

`max_memory_allocated(device=None)` ¶

Return the maximum device memory occupied by tensors in bytes for a given device.

Parameters:

Name	Type	Description	Default
`device`	`int, str, or torch.device`	The device to query. If None, uses the current device. Defaults to None.	`None`

Returns:

Name	Type	Description
`int`	`int`	The maximum memory occupied by tensors in bytes.

Note

This function reflects device memory only. For information about lazy memory allocation, see :func:memory_stats.

`max_memory_reserved(device=None)` ¶

Return the maximum device memory managed by the caching allocator in bytes for a given device.

Parameters:

Name	Type	Description	Default
`device`	`int, str, or torch.device`	The device to query. If None, uses the current device. Defaults to None.	`None`

Returns:

Name	Type	Description
`int`	`int`	The maximum memory managed by the caching allocator in bytes.

Note

This function reflects device memory only. For information about lazy memory allocation, see :func:memory_stats.

`memory_stats(device=None)` ¶

Return a dictionary of device memory allocator statistics for a given device.

The returned dictionary contains various memory statistics including:

allocated.current: Current memory occupied by tensors
allocated.peak: Peak memory occupied by tensors
allocated.total_allocated: Total memory allocated to tensors (cumulative)
allocated.total_freed: Total memory freed from tensors (cumulative)
reserved.current: Current memory managed by the caching allocator
reserved.peak: Peak memory managed by the caching allocator
reserved.total_allocated: Total memory allocated by the caching allocator (cumulative)
reserved.total_freed: Total memory freed by the caching allocator (cumulative)
active.current: Current size of blocks in use (may differ from allocated due to block granularity)
active.peak: Peak size of blocks in use
cached.current: Current size of cached blocks available for reuse
cached.peak: Peak size of cached blocks
num_alloc_retries: Number of allocation retries after cache flush
num_ooms: Number of out-of-memory errors
num_device_alloc: Number of device memory acquisitions
num_device_free: Number of device memory releases

Lazy Tensor Memory Allocation: All memory-related functions in this module (including :func:memory_allocated, :func:memory_reserved, :func:max_memory_allocated, :func:max_memory_reserved, and this function) reflect device memory only, not CPU memory.

RBLN tensors use lazy memory allocation for device memory. When you create a tensor on an RBLN device:

The tensor is initially allocated in CPU memory immediately upon creation
Device memory allocation is deferred until the tensor is actually needed for device operations
When a device operation is required, the tensor data is lazily transferred from CPU to device memory

This lazy allocation strategy means that memory statistics may be lower than expected immediately after tensor creation until the tensors are used in device computations. Device memory statistics will increase when tensors are materialized on the device during actual computation.

The statistics also include lazy tensor related metrics, which provide insights into the memory management for tensors that have not yet been materialized on the device. The specific lazy tensor statistics fields may vary depending on the implementation version.

Parameters:

Name	Type	Description	Default
`device`	`int, str, or torch.device`	The device to query. If None, uses the current device. Defaults to None.	`None`

Returns:

Type	Description
`Dict[str, int]`	Dict[str, int]: A dictionary containing device memory statistics. Note that these statistics
`Dict[str, int]`	reflect device memory only (not CPU memory) and may not include memory for tensors that have
`Dict[str, int]`	not yet been transferred to the device.

Note

To see accurate device memory usage, check statistics after performing operations that require the tensors to be materialized on the device, as device memory is allocated lazily when needed. This applies to all memory-related functions in this module.

`reset_peak_memory_stats(device=None)` ¶

Reset the "peak" stats tracked by the caching allocator for a given device.

This function resets the peak values to their current values for the following stats:

allocated.peak: Reset to allocated.current
reserved.peak: Reset to reserved.current
active.peak: Reset to active.current
cached.peak: Reset to cached.current

Parameters:

Name	Type	Description	Default
`device`	`int, str, or torch.device`	The device to reset stats for. If None, uses the current device. Defaults to None.	`None`

`reset_accumulated_memory_stats(device=None)` ¶

Reset the "accumulated" (historical) stats tracked by the caching allocator for a given device.

This function resets the following accumulated stats to zero:

allocated.total_allocated
allocated.total_freed
reserved.total_allocated
reserved.total_freed
num_alloc_retries
num_ooms
num_device_alloc
num_device_free

Parameters:

Name	Type	Description	Default
`device`	`int, str, or torch.device`	The device to reset stats for. If None, uses the current device. Defaults to None.	`None`

APIs

PyTorch Native APIs¶

RBLN Specific APIs¶

Classes¶

device_of(obj) ¶

Functions¶

get_amp_supported_dtype() ¶

is_available() ¶

current_device() ¶

device_count() ¶

set_device(device) ¶

empty_cache(device=None) ¶

memory_allocated(device=None) ¶

memory_reserved(device=None) ¶

max_memory_allocated(device=None) ¶

max_memory_reserved(device=None) ¶

memory_stats(device=None) ¶

reset_peak_memory_stats(device=None) ¶

reset_accumulated_memory_stats(device=None) ¶

`device_of(obj)` ¶

`get_amp_supported_dtype()` ¶

`is_available()` ¶

`current_device()` ¶

`device_count()` ¶

`set_device(device)` ¶

`empty_cache(device=None)` ¶

`memory_allocated(device=None)` ¶

`memory_reserved(device=None)` ¶

`max_memory_allocated(device=None)` ¶

`max_memory_reserved(device=None)` ¶

`memory_stats(device=None)` ¶

`reset_peak_memory_stats(device=None)` ¶

`reset_accumulated_memory_stats(device=None)` ¶