Skip to content

Common

Key Classes

  • RBLNModel: The main model class for running HuggingFace PreTrained models on RBLN NPU
  • RBLNModelConfig: Configuration class specifically for HuggingFace PreTrained models

API Reference

Classes

RBLNAutoConfig

Functions

load(path, passed_rbln_config=None, kwargs={}, return_unused_kwargs=False) staticmethod

Load RBLNModelConfig from a path. Class name is automatically inferred from the rbln_config.json file.

Parameters:

Name Type Description Default
path str

Path to the RBLNModelConfig.

required
passed_rbln_config Optional[RBLNModelConfig]

RBLNModelConfig to be passed runtime options.

None
kwargs Optional[Dict[str, Any]]

Additional keyword arguments for runtime options.

{}
return_unused_kwargs bool

Whether to return unused keyword arguments.

False

Returns:

Name Type Description
RBLNModelConfig RBLNModelConfig

The loaded RBLNModelConfig.

RBLNModelConfig

Base configuration class for RBLN models that handles compilation settings, runtime options, and submodules.

This class provides functionality for:

  1. Managing compilation configurations for RBLN devices
  2. Configuring runtime behavior such as device placement
  3. Handling nested configuration objects for complex model architectures
  4. Serializing and deserializing configurations

Examples:

Using with RBLNModel.from_pretrained():

from optimum.rbln import RBLNResNetForImageClassification

# Method 1: Using rbln_ prefixed arguments (recommended for simple cases)
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,  # Compile the model
    rbln_image_size=224,
    rbln_batch_size=16,
    rbln_create_runtimes=True,
    rbln_device=0
)

# Method 2: Using a config dictionary
rbln_config_dict = {
    "image_size": 224,
    "batch_size": 16,
    "create_runtimes": True
}
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=rbln_config_dict
)

# Method 3: Using a RBLNModelConfig instance
from optimum.rbln import RBLNResNetForImageClassificationConfig

config = RBLNResNetForImageClassificationConfig(
    image_size=224,
    batch_size=16,
    create_runtimes=True
)

model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=config
)

# Method 4: Combining a config object with override parameters
# (rbln_ prefixed parameters take precedence over rbln_config values)
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=config,
    rbln_image_size=320,  # This overrides the value in config
    rbln_device=1         # This sets a new value
)

Save and load configuration:

1
2
3
4
5
# Save to disk
config.save("/path/to/model")

# Using AutoConfig
loaded_config = RBLNAutoConfig.load("/path/to/model")

Converting between configuration formats:

1
2
3
4
5
6
7
# Converting a dictionary to a config instance
config_dict = {
    "image_size": 224,
    "batch_size": 8,
    "create_runtimes": True
}
config = RBLNResNetForImageClassificationConfig(**config_dict)

Configuration for language models:

1
2
3
4
5
6
7
8
from optimum.rbln import RBLNLlamaForCausalLMConfig, RBLNCompileConfig

# Configure a LLaMA for RBLN
config = RBLNLlamaForCausalLMConfig(
    max_seq_len=4096,
    device=[0, 1, 2, 3],
    tensor_parallel_size=4  # For multi-NPU parallel inference
)

Working with models that have submodules:

from optimum.rbln import RBLNLlavaNextForConditionalGeneration

# Configuring a model with submodules
# LlavaNext has a vision_tower and a language_model submodule
model = RBLNLlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    export=True,
    rbln_config={
        # Main model's (projector, which is not a submodule) configuration
        "create_runtimes": True,
        "device": 0,

        # Submodule configurations as nested dictionaries
        "vision_tower": {
            "image_size": 336,
        },
        "language_model": {
            "tensor_parallel_size": 4,  # Distribute across 4 NPUs
            "max_seq_len": 8192,
            "use_inputs_embeds": True,
            "batch_size": 1,
        },
    },
)

Advanced multi-device deployment with tensor parallelism:

from optimum.rbln import RBLNLlamaForCausalLMConfig

# Setup a complex multi-device configuration for large language models
llm_config = RBLNLlamaForCausalLMConfig(
    # Split model across 8 NPUs
    tensor_parallel_size=8,

    # Runtime options
    device=[8, 9, 10, 11, 12, 13, 14, 15],
    create_runtimes=True,
    activate_profiler=True,  # Enable profiling for performance analysis

    # Model-specific parameters for the LLM
    max_seq_len=131072,
    batch_size=4,
    attn_impl="flash_attn",
)

Compilation without runtime creation (create_runtimes=False):

from optimum.rbln import RBLNLlamaForCausalLM, RBLNLlamaForCausalLMConfig

# Compile a model on a machine without NPU or for later use
config = RBLNLlamaForCausalLMConfig(
    create_runtimes=False,  # Compile only, don't create runtime
    npu="RBLN-CA25",  # Specify target NPU for compilation
    max_seq_len=4096,
    tensor_parallel_size=4,
    batch_size=1
)

# Export the model - will compile but not create runtimes
model = RBLNLlamaForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    export=True,
    rbln_config=config
)

# Save the compiled model for later use on NPU
model.save_pretrained("./compiled_llama_model")

# Later, on a machine with the target NPU
inference_model = RBLNLlamaForCausalLM.from_pretrained(
    "./compiled_llama_model",
    rbln_create_runtimes=True,  # Now create runtimes (Optional)
)

Two-stage workflow with separate compilation and runtime:

from optimum.rbln import RBLNResNetForImageClassification

# Stage 1: Model engineer compiles model (can be on any machine)
def compile_model():
    model = RBLNResNetForImageClassification.from_pretrained(
        "microsoft/resnet-50",
        export=True,
        rbln_create_runtimes=False,
        rbln_npu="RBLN-CA25",
        rbln_image_size=224
    )
    model.save_pretrained("./compiled_model")
    print("Model compiled and saved, ready for deployment")

# Stage 2: Deployment engineer loads model on NPU
def deploy_model():
    model = RBLNResNetForImageClassification.from_pretrained(
        "./compiled_model",
        rbln_create_runtimes=True,
    )
    print("Model loaded and ready for inference")
    return model

Functions

__init__(cls_name=None, create_runtimes=None, optimize_host_memory=None, device=None, device_map=None, activate_profiler=None, npu=None, tensor_parallel_size=None, optimum_rbln_version=None, _compile_cfgs=[], **kwargs)

Initialize a RBLN model configuration with runtime options and compile configurations.

Parameters:

Name Type Description Default
cls_name Optional[str]

The class name of the configuration. Defaults to the current class name.

None
create_runtimes Optional[bool]

Whether to create RBLN runtimes. Defaults to True if an NPU is available.

None
optimize_host_memory Optional[bool]

Whether to optimize host memory usage. Defaults to True.

None
device Optional[Union[int, List[int]]]

The device(s) to load the model onto. Can be a single device ID or a list.

None
device_map Optional[Dict[str, Union[int, List[int]]]]

Mapping from compiled model names to device IDs.

None
activate_profiler Optional[bool]

Whether to activate the profiler for performance analysis.

None
npu Optional[str]

The NPU device name to use for compilation.

None
tensor_parallel_size Optional[int]

Size for tensor parallelism to distribute the model across devices.

None
optimum_rbln_version Optional[str]

The optimum-rbln version used for this configuration.

None
_compile_cfgs List[RBLNCompileConfig]

List of compilation configurations for the model.

[]
**kwargs Dict[str, Any]

Additional keyword arguments.

{}

Raises:

Type Description
ValueError

If unexpected keyword arguments are provided.

Classes

RBLNModel

Functions

from_pretrained(model_id, export=False, rbln_config=None, **kwargs) classmethod

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.

Parameters:

Name Type Description Default
model_id Union[str, Path]

The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.

required
export bool

A boolean flag to indicate whether the model should be compiled.

False
rbln_config Optional[Union[Dict, RBLNModelConfig]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Dict[str, Any]

Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

Returns:

Type Description
Self

A RBLN model instance ready for inference on RBLN NPU devices.

from_model(model, *, rbln_config=None, **kwargs) classmethod

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name Type Description Default
model PreTrainedModel

The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class.

required
rbln_config Optional[Union[Dict, RBLNModelConfig]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Dict[str, Any]

Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

The method performs the following steps:

  1. Compiles the PyTorch model into an optimized RBLN graph
  2. Configures the model for the specified NPU device
  3. Creates the necessary runtime objects if requested
  4. Saves the compiled model and configurations

Returns:

Type Description
Self

A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory)

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name Type Description Default
save_directory Union[str, PathLike]

The directory to save the model and its configuration files. Will be created if it doesn't exist.

required