Skip to content

APIs

PyTorch Native APIs

Most of PyTorch native functions can be used on RBLN NPU.

RBLN Specific APIs

The following functions are RBLN NPU-specific functions defined in the torch.rbln module.

Classes

device_of(obj)

Context-manager that changes the current device to that of given object.

You can use both tensors and storages as arguments. If a given object is not allocated on an RBLN device, this is a no-op.

Parameters:

Name Type Description Default
obj Tensor or Storage

object allocated on the selected device.

required
Example
1
2
3
4
tensor = torch.tensor([2, 64], device="rbln:1")
with torch.rbln.device_of(tensor):
    # operations here will use the same device as tensor (rbln:1)
    new_tensor = torch.zeros(3, device="rbln")

Functions

get_amp_supported_dtype()

Get a list of data types supported by automatic mixed precision (AMP) on RBLN devices.

Returns:

Type Description
List[dtype]

List[torch.dtype]: A list of data types supported by AMP.

Note

This function currently returns only torch.float16.

is_available()

Check if any RBLN devices are available.

Returns:

Name Type Description
bool bool

True if at least one RBLN device is available, False otherwise.

current_device()

Get the index of the currently selected RBLN device.

Returns:

Name Type Description
int int

The index of the currently selected RBLN device.

device_count()

Get the number of available RBLN devices.

Returns:

Name Type Description
int int

The number of available RBLN devices.

set_device(device)

Set the current device.

Parameters:

Name Type Description Default
device device or int or str

selected device.

required

empty_cache(device=None)

Release all unoccupied cached memory currently held by the caching allocator so that those can be used in other application.

Note that only unfragmented (non-split) blocks can be released; fragmented blocks that have been split will remain in the cache until they can be coalesced.

Parameters:

Name Type Description Default
device int, str, or torch.device

The device to empty cache for. If None, uses the current device. Defaults to None.

None

memory_allocated(device=None)

Return the current device memory occupied by tensors in bytes for a given device.

Parameters:

Name Type Description Default
device int, str, or torch.device

The device to query. If None, uses the current device. Defaults to None.

None

Returns:

Name Type Description
int int

The current memory occupied by tensors in bytes.

Note

This function reflects device memory only. For information about lazy memory allocation, see :func:memory_stats.

memory_reserved(device=None)

Return the current device memory managed by the caching allocator in bytes for a given device.

Parameters:

Name Type Description Default
device int, str, or torch.device

The device to query. If None, uses the current device. Defaults to None.

None

Returns:

Name Type Description
int int

The current memory managed by the caching allocator in bytes.

Note

This function reflects device memory only. For information about lazy memory allocation, see :func:memory_stats.

max_memory_allocated(device=None)

Return the maximum device memory occupied by tensors in bytes for a given device.

Parameters:

Name Type Description Default
device int, str, or torch.device

The device to query. If None, uses the current device. Defaults to None.

None

Returns:

Name Type Description
int int

The maximum memory occupied by tensors in bytes.

Note

This function reflects device memory only. For information about lazy memory allocation, see :func:memory_stats.

max_memory_reserved(device=None)

Return the maximum device memory managed by the caching allocator in bytes for a given device.

Parameters:

Name Type Description Default
device int, str, or torch.device

The device to query. If None, uses the current device. Defaults to None.

None

Returns:

Name Type Description
int int

The maximum memory managed by the caching allocator in bytes.

Note

This function reflects device memory only. For information about lazy memory allocation, see :func:memory_stats.

memory_stats(device=None)

Return a dictionary of device memory allocator statistics for a given device.

The returned dictionary contains various memory statistics including:

  • allocated.current: Current memory occupied by tensors
  • allocated.peak: Peak memory occupied by tensors
  • allocated.total_allocated: Total memory allocated to tensors (cumulative)
  • allocated.total_freed: Total memory freed from tensors (cumulative)
  • reserved.current: Current memory managed by the caching allocator
  • reserved.peak: Peak memory managed by the caching allocator
  • reserved.total_allocated: Total memory allocated by the caching allocator (cumulative)
  • reserved.total_freed: Total memory freed by the caching allocator (cumulative)
  • active.current: Current size of blocks in use (may differ from allocated due to block granularity)
  • active.peak: Peak size of blocks in use
  • cached.current: Current size of cached blocks available for reuse
  • cached.peak: Peak size of cached blocks
  • num_alloc_retries: Number of allocation retries after cache flush
  • num_ooms: Number of out-of-memory errors
  • num_device_alloc: Number of device memory acquisitions
  • num_device_free: Number of device memory releases

Lazy Tensor Memory Allocation: All memory-related functions in this module (including :func:memory_allocated, :func:memory_reserved, :func:max_memory_allocated, :func:max_memory_reserved, and this function) reflect device memory only, not CPU memory.

RBLN tensors use lazy memory allocation for device memory. When you create a tensor on an RBLN device:

  • The tensor is initially allocated in CPU memory immediately upon creation
  • Device memory allocation is deferred until the tensor is actually needed for device operations
  • When a device operation is required, the tensor data is lazily transferred from CPU to device memory

This lazy allocation strategy means that memory statistics may be lower than expected immediately after tensor creation until the tensors are used in device computations. Device memory statistics will increase when tensors are materialized on the device during actual computation.

The statistics also include lazy tensor related metrics, which provide insights into the memory management for tensors that have not yet been materialized on the device. The specific lazy tensor statistics fields may vary depending on the implementation version.

Parameters:

Name Type Description Default
device int, str, or torch.device

The device to query. If None, uses the current device. Defaults to None.

None

Returns:

Type Description
Dict[str, int]

Dict[str, int]: A dictionary containing device memory statistics. Note that these statistics

Dict[str, int]

reflect device memory only (not CPU memory) and may not include memory for tensors that have

Dict[str, int]

not yet been transferred to the device.

Note

To see accurate device memory usage, check statistics after performing operations that require the tensors to be materialized on the device, as device memory is allocated lazily when needed. This applies to all memory-related functions in this module.

reset_peak_memory_stats(device=None)

Reset the "peak" stats tracked by the caching allocator for a given device.

This function resets the peak values to their current values for the following stats:

  • allocated.peak: Reset to allocated.current
  • reserved.peak: Reset to reserved.current
  • active.peak: Reset to active.current
  • cached.peak: Reset to cached.current

Parameters:

Name Type Description Default
device int, str, or torch.device

The device to reset stats for. If None, uses the current device. Defaults to None.

None

reset_accumulated_memory_stats(device=None)

Reset the "accumulated" (historical) stats tracked by the caching allocator for a given device.

This function resets the following accumulated stats to zero:

  • allocated.total_allocated
  • allocated.total_freed
  • reserved.total_allocated
  • reserved.total_freed
  • num_alloc_retries
  • num_ooms
  • num_device_alloc
  • num_device_free

Parameters:

Name Type Description Default
device int, str, or torch.device

The device to reset stats for. If None, uses the current device. Defaults to None.

None