Docker Support
The RBLN NPUs can be utilized within a Docker container for streamlined deployment and management.
If a server has multiple RBLN NPUs, each container can be configured to use the specific NPU, enabling efficient resource allocation in the multi-NPU environment.
Rebellions Scalable Design (RSD)
Rebellions Scalable Design (RSD) is a technology that enables efficient management and utilization of multiple RBLN NPUs. RSD provides:
- Device Grouping: NPUs are organized into groups for better resource management
- Tensor Parallelism: Large language models can be distributed across multiple NPUs within the same group
- Fault Isolation: Separate groups ensure that issues in one group don't affect others
- Independent Scheduling: Each group can be managed independently for optimal performance
All RBLN NPUs, whether used individually or in multi-NPU configurations, operate through the RSD. This means that even single NPU containers benefit from RSD's management capabilities.
Note
The rbln-smi
, which assists in managing and monitoring the status of RBLN NPUs, is included in the RBLN driver package. By mounting rbln-smi
when creating a Docker container, we can monitor the status of the assigned RBLN NPU from within the container.
Container Creation Guidelines
This section summarizes common requirements and options when using RBLN NPUs with Docker.
RSD Group Management
Check RSD Group Configuration
| $ rbln-smi -g
+-------------------------------------------------------------------------------------------------------+
| RSD Management Group KMD ver: 2.0.1 |
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| Grp | NPU | Name | Device | PCI BUS ID | Temp | Power | Perf | Memory(used/total) | Util |
+=====+=====+===========+=========+===============+======+=========+======+=====================+=======+
| 0 | 0 | RBLN-CA22 | rbln0 | 0000:1b:00.0 | 37C | 19.5W | P14 | 0.0B / 15.7GiB | 0.0 |
| | 1 | RBLN-CA22 | rbln1 | 0000:1c:00.0 | 36C | 19.2W | P14 | 0.0B / 15.7GiB | 0.0 |
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
+-------------------------------------------------------------------------------------------------------+
| RSD Context Information |
+-----+-----+---------------------+--------------+-----------+----------+------+---------------+--------+
| Grp | NPU | Process | PID | CTX | Priority | PTID | Memalloc | Status |
+=====+=====+=====================+==============+===========+==========+======+===============+========+
| N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
+-----+-----+---------------------+--------------+-----------+----------+------+---------------+--------+
|
By default, all NPUs are shown under group 0 (as indicated in the Grp
column). For optimal container isolation and performance, you should create separate groups for each container.
Create a RSD Group
Create dedicated RSD groups for your containers:
| $ sudo rbln-smi group -c <group_id> -a <device_id_list>
|
Delete a RSD Group
You can destroy a group you created with the following command:
| $ sudo rbln-smi group -d <group_id>
|
Accessing NPUs in Containers
To grant a Docker container access to the host's RBLN NPUs, run your container with the following options.
| docker run --device /dev/rsdX:/dev/rsd0 \
--device /dev/rblnX:/dev/rbln0 \
--volume /usr/local/bin/rbln-smi:/usr/local/bin/rbln-smi \
-ti IMAGE_NAME:TAG
|
The purpose of each option is as follows:
--device /dev/rsdX:/dev/rsd0
- This exposes the RSD group interface to the container.
--device /dev/rblnX:/dev/rbln0
- This exposes individual RBLN NPU devices to the container.
--volume /usr/local/bin/rbln-smi:/usr/local/bin/rbln-smi
- This option enables the use of the
rbln-smi
monitoring tool within the container.
Allocate a Single RBLN NPU
This section demonstrates how to configure a Docker container to use a single RBLN NPU with proper RSD group isolation.
Important
Before proceeding, read the Container Creation Guidelines above to understand the common requirements and options referenced in this step-by-step guide.
Step 1: Create a RSD Group
Create a separate RSD group for your single NPU container:
| $ sudo rbln-smi group -c 1 -a 1
|
This command creates group 1 and assigns rbln1 to it, removing it from the default group 0.
Step 2: Verify RSD Group Configuration
Verify that the NPU is now in its dedicated group:
Your output should show the NPU in its own group:
| +-------------------------------------------------------------------------------------------------------+
| RSD Management Group KMD ver: 2.0.1 |
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| Grp | NPU | Name | Device | PCI BUS ID | Temp | Power | Perf | Memory(used/total) | Util |
+=====+=====+===========+=========+===============+======+=========+======+=====================+=======+
| 0 | 0 | RBLN-CA22 | rbln0 | 0000:1b:00.0 | 35C | 18.4W | P14 | 0.0B / 15.7GiB | 0.0 |
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| 1 | 0 | RBLN-CA22 | rbln1 | 0000:1c:00.0 | 34C | 18.2W | P14 | 0.0B / 15.7GiB | 0.0 |
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
+-------------------------------------------------------------------------------------------------------+
| RSD Context Information |
+-----+-----+---------------------+--------------+-----------+----------+------+---------------+--------+
| Grp | NPU | Process | PID | CTX | Priority | PTID | Memalloc | Status |
+=====+=====+=====================+==============+===========+==========+======+===============+========+
| N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
+-----+-----+---------------------+--------------+-----------+----------+------+---------------+--------+
|
Notice that rbln1 is now in group 1, separate from rbln0 in group 0.
Step 3: Run Docker Container with Single NPU
Run a Docker container with access to the dedicated NPU:
| $ docker run --device /dev/rsd1:/dev/rsd0 \
--device /dev/rbln1:/dev/rbln0 \
--volume /usr/local/bin/rbln-smi:/usr/local/bin/rbln-smi \
-ti IMAGE_NAME:TAG
|
The container now has exclusive access to rbln1 through RSD group 1, ensuring optimal performance and isolation.
Allocate Multiple RBLN NPUs
This section demonstrates how to configure Docker containers to use multiple RBLN NPUs. Multiple NPUs can be used for large language models that leverage tensor parallelism.
A list of models that support RSD can be found in Optimum RBLN Multi-NPU Supported Models.
Container Allocation Strategy
You can create multiple containers, each with a dedicated NPU group. For example, with 8 RBLN NPUs, you can create 2 containers, with 4 NPUs assigned to separate RSD groups.
Important
Before proceeding, read the Container Creation Guidelines above to understand the common requirements and options referenced in this step-by-step guide.
Note
A single RBLN-CA25 NPU card is comprised of 4 devices. Check with rbln-smi
and allocate all 4 devices to the same container.
Step 1: Create a RSD Group
Ensure all NPUs that will work together in a single container are in the same RSD group.
For example, if you have 8 devices and want to create a container with 4 NPUs:
| $ sudo rbln-smi group -c 1 -a 4,5,6,7
|
Step 2: Verify Group Configuration
Verify that the NPUs are properly grouped:
Your output should show the NPUs organized in separate groups:
| +-------------------------------------------------------------------------------------------------------+
| RSD Management Group KMD ver: 2.0.1 |
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| Grp | NPU | Name | Device | PCI BUS ID | Temp | Power | Perf | Memory(used/total) | Util |
+=====+=====+===========+=========+===============+======+=========+======+=====================+=======+
| 0 | 0 | RBLN-CA22 | rbln0 | 0000:1b:00.0 | 38C | 19.5W | P14 | 0.0B / 15.7GiB | 0.0 |
| | 1 | RBLN-CA22 | rbln1 | 0000:1c:00.0 | 37C | 19.3W | P14 | 0.0B / 15.7GiB | 0.0 |
| | 2 | RBLN-CA22 | rbln2 | 0000:1f:00.0 | 36C | 18.5W | P14 | 0.0B / 15.7GiB | 0.0 |
| | 3 | RBLN-CA22 | rbln3 | 0000:22:00.0 | 39C | 19.6W | P14 | 0.0B / 15.7GiB | 0.0 |
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
| 1 | 0 | RBLN-CA22 | rbln4 | 0000:41:00.0 | 36C | 17.9W | P14 | 0.0B / 15.7GiB | 0.0 |
| | 1 | RBLN-CA22 | rbln5 | 0000:42:00.0 | 36C | 21.5W | P14 | 0.0B / 15.7GiB | 0.0 |
| | 2 | RBLN-CA22 | rbln6 | 0000:45:00.0 | 40C | 20.6W | P14 | 0.0B / 15.7GiB | 0.0 |
| | 3 | RBLN-CA22 | rbln7 | 0000:48:00.0 | 36C | 17.9W | P14 | 0.0B / 15.7GiB | 0.0 |
+-----+-----+-----------+---------+---------------+------+---------+------+---------------------+-------+
+-------------------------------------------------------------------------------------------------------+
| RSD Context Information |
+-----+-----+---------------------+--------------+-----------+----------+------+---------------+--------+
| Grp | NPU | Process | PID | CTX | Priority | PTID | Memalloc | Status |
+=====+=====+=====================+==============+===========+==========+======+===============+========+
| N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
+-----+-----+---------------------+--------------+-----------+----------+------+---------------+--------+
|
Notice that rbln4~rbln7 are now in group 1, separate from rbln0~rbln3 in group 0.
Step 3: Run Docker Containers with Multiple NPUs
Container 1: Assign rbln0~rbln3 (Group 0)
| $ docker run \
--device /dev/rsd0 \
--device /dev/rbln0 --device /dev/rbln1 \
--device /dev/rbln2 --device /dev/rbln3 \
--volume /usr/local/bin/rbln-smi:/usr/local/bin/rbln-smi \
-ti IMAGE_NAME:TAG
|
Container 2: Assign rbln4~rbln7 (Group 1)
| $ docker run \
--device /dev/rsd1:/dev/rsd0 \
--device /dev/rbln4:/dev/rbln0 --device /dev/rbln5:/dev/rbln1 \
--device /dev/rbln6:/dev/rbln2 --device /dev/rbln7:/dev/rbln3 \
--volume /usr/local/bin/rbln-smi:/usr/local/bin/rbln-smi \
-ti IMAGE_NAME:TAG
|
In this setup, each Docker container can utilize 4 RBLN NPUs:
- Container 1: Uses rbln0~rbln3 through RSD group 0 (
/dev/rsd0
)
- Container 2: Uses rbln4~rbln7 through RSD group 1 (
/dev/rsd1
)