Skip to content

RBLN NPU Architecture

ATOM™ Architecture

ATOM™ is a multi-core System-on-Chip (SoC) that integrates all essential components for running deep neural networks into a single chip. It combines Neural Engines, a Command Processor, on-chip local and global scratchpad memory hierarchy, Network-on-Chip (NoC) bus fabric, and PCIe 5.0 and GDDR6 interfaces into a compact design. ATOM™ features 4MB of local SRAM (Scratch Pad) within each Neural Engine, 32MB of global SRAM (Shared Memory) shared across all Neural Engines, and 16GB of off-chip GDDR6 DRAM. This hierarchical structure of on-chip and off-chip memory reduces execution time while minimizing energy consumption for memory access.

For large-scale models, such as large language models (LLMs) that often require more than 16GB of memory, a single ATOM™ device may not be able to load an entire model at once. In this case, large-scale models are partitioned across multiple devices for parallel execution through Rebellions Scalable Design (RSD) architecture. During this process, multiple devices must communicate and synchronize data to ensure proper execution.

For more detailed information, please refer to our white papers:

Commands

RBLN Profiler provides a total of seven commands: [Host, Neural Engine Clusters, Neural DMA, Task DMA, External HDMA, Device HDMA, Device Sync]. Each command represents a specific operation, such as computation, data movement, or signal transmission between hardware components. The Profiler chronologically traces these commands and visualizes them using Perfetto. More detailed information about Perfetto can be found in the Introduction to Perfetto. To effectively analyze profiling results, refer to the image below, which illustrates the connections between hardware components and the commands described above.

Host

The Host command represents operations that are offloaded to the host CPU when they are either more efficient than running on the NPU or not supported by the NPU.

Neural Engine Clusters

The Neural Engine Clusters command represents operations that are running on the Neural Engines in ATOM™. The Neural Engines are designed for computation with a focus on low latency, high utilization, and flexibility. The list of operations supported by the Neural Engines is summarized in the Supproted OPs.

Neural DMA

The Neural DMA command represents data transfers between the device DRAM and the Neural Engine's Scratch Pad, including program binaries, input tensors, and kernel weights of the target model. The Neural DMA command can operate simultaneously with other commands, so the RBLN Compiler generates the required dependencies between the commands to ensure correct execution. These dependencies are managed by the Neural Engine's Task Manager at runtime.

Task DMA

The Task DMA command represents data transfers between the device DRAM and the Shared Memory in the SoC, including input tensors, intermediate tensors, and kernel weithts of the target model. The RBLN Compiler leverages two types of Task DMA and Neural DMA commands to achieve optimal performance of the target workloads.

External/Device HDMA

The External HDMA command represents data transfers between the host DRAM and the device DRAM, while Device HDMA command represents data transfers between the device DRAMs or the Shared Memories across different devices under the RSD configuration.

Device Sync

The Device Sync command represents synchronization between different devices under the RSD configuration. The RBLN Compiler ensures data synchronization across multiple devices while minimizing inter-device communication overhead and maximizing effective memory bandwidth. Device Sync commands generated by the RBLN Compiler are managed by the Command Processor at runtime to verify that data communication is successfully completed and to enable the immediate execution of subsequent commands.