APIs
PyTorch Native APIs¶
PyTorch의 대부분의 native 함수들은 RBLN NPU에서도 그대로 사용할 수 있습니다.
RBLN 특화 API¶
다음 함수들은 torch.rbln module에서 정의된 RBLN NPU 특화 함수들입니다. 명확한 이해를 위해 영문으로 작성되어 있습니다.
Classes¶
device_of(obj)
¶
Context-manager that changes the current device to that of given object.
You can use both tensors and storages as arguments. If a given object is not allocated on an RBLN device, this is a no-op.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
Tensor or Storage
|
object allocated on the selected device. |
required |
Example
Functions¶
get_amp_supported_dtype()
¶
Get a list of data types supported by automatic mixed precision (AMP) on RBLN devices.
Returns:
| Type | Description |
|---|---|
List[dtype]
|
List[torch.dtype]: A list of data types supported by AMP. |
Note
This function currently returns only torch.float16.
is_available()
¶
Check if any RBLN devices are available.
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if at least one RBLN device is available, False otherwise. |
current_device()
¶
Get the index of the currently selected RBLN device.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The index of the currently selected RBLN device. |
device_count()
¶
Get the number of available RBLN devices.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The number of available RBLN devices. |
set_device(device)
¶
Set the current device.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
device or int or str
|
selected device. |
required |
empty_cache(device=None)
¶
Release all unoccupied cached memory currently held by the caching allocator so that those can be used in other application.
Note that only unfragmented (non-split) blocks can be released; fragmented blocks that have been split will remain in the cache until they can be coalesced.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
int, str, or torch.device
|
The device to empty cache for. If None, uses the current device. Defaults to None. |
None
|
memory_allocated(device=None)
¶
Return the current device memory occupied by tensors in bytes for a given device.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
int, str, or torch.device
|
The device to query. If None, uses the current device. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The current memory occupied by tensors in bytes. |
Note
This function reflects device memory only. For information about lazy memory allocation,
see :func:memory_stats.
memory_reserved(device=None)
¶
Return the current device memory managed by the caching allocator in bytes for a given device.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
int, str, or torch.device
|
The device to query. If None, uses the current device. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The current memory managed by the caching allocator in bytes. |
Note
This function reflects device memory only. For information about lazy memory allocation,
see :func:memory_stats.
max_memory_allocated(device=None)
¶
Return the maximum device memory occupied by tensors in bytes for a given device.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
int, str, or torch.device
|
The device to query. If None, uses the current device. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The maximum memory occupied by tensors in bytes. |
Note
This function reflects device memory only. For information about lazy memory allocation,
see :func:memory_stats.
max_memory_reserved(device=None)
¶
Return the maximum device memory managed by the caching allocator in bytes for a given device.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
int, str, or torch.device
|
The device to query. If None, uses the current device. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The maximum memory managed by the caching allocator in bytes. |
Note
This function reflects device memory only. For information about lazy memory allocation,
see :func:memory_stats.
memory_stats(device=None)
¶
Return a dictionary of device memory allocator statistics for a given device.
The returned dictionary contains various memory statistics including:
allocated.current: Current memory occupied by tensorsallocated.peak: Peak memory occupied by tensorsallocated.total_allocated: Total memory allocated to tensors (cumulative)allocated.total_freed: Total memory freed from tensors (cumulative)reserved.current: Current memory managed by the caching allocatorreserved.peak: Peak memory managed by the caching allocatorreserved.total_allocated: Total memory allocated by the caching allocator (cumulative)reserved.total_freed: Total memory freed by the caching allocator (cumulative)active.current: Current size of blocks in use (may differ from allocated due to block granularity)active.peak: Peak size of blocks in usecached.current: Current size of cached blocks available for reusecached.peak: Peak size of cached blocksnum_alloc_retries: Number of allocation retries after cache flushnum_ooms: Number of out-of-memory errorsnum_device_alloc: Number of device memory acquisitionsnum_device_free: Number of device memory releases
Lazy Tensor Memory Allocation:
All memory-related functions in this module (including :func:memory_allocated, :func:memory_reserved,
:func:max_memory_allocated, :func:max_memory_reserved, and this function) reflect device memory only,
not CPU memory.
RBLN tensors use lazy memory allocation for device memory. When you create a tensor on an RBLN device:
- The tensor is initially allocated in CPU memory immediately upon creation
- Device memory allocation is deferred until the tensor is actually needed for device operations
- When a device operation is required, the tensor data is lazily transferred from CPU to device memory
This lazy allocation strategy means that memory statistics may be lower than expected immediately after tensor creation until the tensors are used in device computations. Device memory statistics will increase when tensors are materialized on the device during actual computation.
The statistics also include lazy tensor related metrics, which provide insights into the memory management for tensors that have not yet been materialized on the device. The specific lazy tensor statistics fields may vary depending on the implementation version.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
int, str, or torch.device
|
The device to query. If None, uses the current device. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, int]
|
Dict[str, int]: A dictionary containing device memory statistics. Note that these statistics |
Dict[str, int]
|
reflect device memory only (not CPU memory) and may not include memory for tensors that have |
Dict[str, int]
|
not yet been transferred to the device. |
Note
To see accurate device memory usage, check statistics after performing operations that require the tensors to be materialized on the device, as device memory is allocated lazily when needed. This applies to all memory-related functions in this module.
reset_peak_memory_stats(device=None)
¶
Reset the "peak" stats tracked by the caching allocator for a given device.
This function resets the peak values to their current values for the following stats:
allocated.peak: Reset toallocated.currentreserved.peak: Reset toreserved.currentactive.peak: Reset toactive.currentcached.peak: Reset tocached.current
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
int, str, or torch.device
|
The device to reset stats for. If None, uses the current device. Defaults to None. |
None
|
reset_accumulated_memory_stats(device=None)
¶
Reset the "accumulated" (historical) stats tracked by the caching allocator for a given device.
This function resets the following accumulated stats to zero:
allocated.total_allocatedallocated.total_freedreserved.total_allocatedreserved.total_freednum_alloc_retriesnum_oomsnum_device_allocnum_device_free
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device
|
int, str, or torch.device
|
The device to reset stats for. If None, uses the current device. Defaults to None. |
None
|