Common¶

Key Classes¶

RBLNModel: The main model class for running HuggingFace PreTrained models on RBLN NPU
RBLNModelConfig: Configuration class specifically for HuggingFace PreTrained models

API Reference¶

Classes¶

`RBLNAutoConfig` ¶

Functions¶

`load(path, passed_rbln_config=None, kwargs={}, return_unused_kwargs=False)` `staticmethod` ¶

Load RBLNModelConfig from a path. Class name is automatically inferred from the rbln_config.json file.

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to the RBLNModelConfig.	required
`passed_rbln_config`	`Optional[RBLNModelConfig]`	RBLNModelConfig to be passed runtime options.	`None`
`kwargs`	`Optional[Dict[str, Any]]`	Additional keyword arguments for runtime options.	`{}`
`return_unused_kwargs`	`bool`	Whether to return unused keyword arguments.	`False`

Returns:

Name	Type	Description
`RBLNModelConfig`	`RBLNModelConfig`	The loaded RBLNModelConfig.

`RBLNModelConfig` ¶

Base configuration class for RBLN models that handles compilation settings, runtime options, and submodules.

This class provides functionality for:

Managing compilation configurations for RBLN devices
Configuring runtime behavior such as device placement
Handling nested configuration objects for complex model architectures
Serializing and deserializing configurations

Examples:

Using with RBLNModel.from_pretrained():

from optimum.rbln import RBLNResNetForImageClassification

# Method 1: Using rbln_ prefixed arguments (recommended for simple cases)
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,  # Compile the model
    rbln_image_size=224,
    rbln_batch_size=16,
    rbln_create_runtimes=True,
    rbln_device=0
)

# Method 2: Using a config dictionary
rbln_config_dict = {
    "image_size": 224,
    "batch_size": 16,
    "create_runtimes": True
}
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=rbln_config_dict
)

# Method 3: Using a RBLNModelConfig instance
from optimum.rbln import RBLNResNetForImageClassificationConfig

config = RBLNResNetForImageClassificationConfig(
    image_size=224,
    batch_size=16,
    create_runtimes=True
)

model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=config
)

# Method 4: Combining a config object with override parameters
# (rbln_ prefixed parameters take precedence over rbln_config values)
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=config,
    rbln_image_size=320,  # This overrides the value in config
    rbln_device=1         # This sets a new value
)

Save and load configuration:

# Save to disk
config.save("/path/to/model")

# Using AutoConfig
loaded_config = RBLNAutoConfig.load("/path/to/model")

Converting between configuration formats:

# Converting a dictionary to a config instance
config_dict = {
    "image_size": 224,
    "batch_size": 8,
    "create_runtimes": True
}
config = RBLNResNetForImageClassificationConfig(**config_dict)

Configuration for language models:

from optimum.rbln import RBLNLlamaForCausalLMConfig, RBLNCompileConfig

# Configure a LLaMA for RBLN
config = RBLNLlamaForCausalLMConfig(
    max_seq_len=4096,
    device=[0, 1, 2, 3],
    tensor_parallel_size=4  # For multi-NPU parallel inference
)

Working with models that have submodules:

from optimum.rbln import RBLNLlavaNextForConditionalGeneration

# Configuring a model with submodules
# LlavaNext has a vision_tower and a language_model submodule
model = RBLNLlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    export=True,
    rbln_config={
        # Main model's (projector, which is not a submodule) configuration
        "create_runtimes": True,
        "device": 0,

        # Submodule configurations as nested dictionaries
        "vision_tower": {
            "image_size": 336,
        },
        "language_model": {
            "tensor_parallel_size": 4,  # Distribute across 4 NPUs
            "max_seq_len": 8192,
            "use_inputs_embeds": True,
            "batch_size": 1,
        },
    },
)

Advanced multi-device deployment with tensor parallelism:

from optimum.rbln import RBLNLlamaForCausalLMConfig

# Setup a complex multi-device configuration for large language models
llm_config = RBLNLlamaForCausalLMConfig(
    # Split model across 8 NPUs
    tensor_parallel_size=8,

    # Runtime options
    device=[8, 9, 10, 11, 12, 13, 14, 15],
    create_runtimes=True,
    activate_profiler=True,  # Enable profiling for performance analysis

    # Model-specific parameters for the LLM
    max_seq_len=131072,
    batch_size=4,
    attn_impl="flash_attn",
)

Compilation without runtime creation (create_runtimes=False):

from optimum.rbln import RBLNLlamaForCausalLM, RBLNLlamaForCausalLMConfig

# Compile a model on a machine without NPU or for later use
config = RBLNLlamaForCausalLMConfig(
    create_runtimes=False,  # Compile only, don't create runtime
    npu="RBLN-CA25",  # Specify target NPU for compilation
    max_seq_len=4096,
    tensor_parallel_size=4,
    batch_size=1
)

# Export the model - will compile but not create runtimes
model = RBLNLlamaForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    export=True,
    rbln_config=config
)

# Save the compiled model for later use on NPU
model.save_pretrained("./compiled_llama_model")

# Later, on a machine with the target NPU
inference_model = RBLNLlamaForCausalLM.from_pretrained(
    "./compiled_llama_model",
    rbln_create_runtimes=True,  # Now create runtimes (Optional)
)

Two-stage workflow with separate compilation and runtime:

from optimum.rbln import RBLNResNetForImageClassification

# Stage 1: Model engineer compiles model (can be on any machine)
def compile_model():
    model = RBLNResNetForImageClassification.from_pretrained(
        "microsoft/resnet-50",
        export=True,
        rbln_create_runtimes=False,
        rbln_npu="RBLN-CA25",
        rbln_image_size=224
    )
    model.save_pretrained("./compiled_model")
    print("Model compiled and saved, ready for deployment")

# Stage 2: Deployment engineer loads model on NPU
def deploy_model():
    model = RBLNResNetForImageClassification.from_pretrained(
        "./compiled_model",
        rbln_create_runtimes=True,
    )
    print("Model loaded and ready for inference")
    return model

Functions¶

`init(cls_name=None, create_runtimes=None, optimize_host_memory=None, device=None, device_map=None, activate_profiler=None, npu=None, tensor_parallel_size=None, optimum_rbln_version=None, _compile_cfgs=[], **kwargs)` ¶

Initialize a RBLN model configuration with runtime options and compile configurations.

Parameters:

Name	Type	Description	Default
`cls_name`	`Optional[str]`	The class name of the configuration. Defaults to the current class name.	`None`
`create_runtimes`	`Optional[bool]`	Whether to create RBLN runtimes. Defaults to True if an NPU is available.	`None`
`optimize_host_memory`	`Optional[bool]`	Whether to optimize host memory usage. Defaults to True.	`None`
`device`	`Optional[Union[int, List[int]]]`	The device(s) to load the model onto. Can be a single device ID or a list.	`None`
`device_map`	`Optional[Dict[str, Union[int, List[int]]]]`	Mapping from compiled model names to device IDs.	`None`
`activate_profiler`	`Optional[bool]`	Whether to activate the profiler for performance analysis.	`None`
`npu`	`Optional[str]`	The NPU device name to use for compilation.	`None`
`tensor_parallel_size`	`Optional[int]`	Size for tensor parallelism to distribute the model across devices.	`None`
`optimum_rbln_version`	`Optional[str]`	The optimum-rbln version used for this configuration.	`None`
`_compile_cfgs`	`List[RBLNCompileConfig]`	List of compilation configurations for the model.	`[]`
`**kwargs`	`Dict[str, Any]`	Additional keyword arguments.	`{}`

Raises:

Type	Description
`ValueError`	If unexpected keyword arguments are provided.

Classes¶

`RBLNModel` ¶

Functions¶

`from_pretrained(model_id, export=False, rbln_config=None, **kwargs)` `classmethod` ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate whether the model should be compiled.	`False`
`rbln_config`	`Optional[Union[Dict, RBLNModelConfig]]`	Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., `RBLNLlamaForCausalLMConfig` for Llama models). For detailed configuration options, see the specific model's configuration class documentation.	`None`
`kwargs`	`Dict[str, Any]`	Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.	`{}`

Returns:

Type	Description
`Self`	A RBLN model instance ready for inference on RBLN NPU devices.

`from_model(model, *, rbln_config=None, **kwargs)` `classmethod` ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class.	required
`rbln_config`	`Optional[Union[Dict, RBLNModelConfig]]`	Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., `RBLNLlamaForCausalLMConfig` for Llama models). For detailed configuration options, see the specific model's configuration class documentation.	`None`
`kwargs`	`Dict[str, Any]`	Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.	`{}`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`Self`	A RBLN model instance ready for inference on RBLN NPU devices.

`save_pretrained(save_directory)` ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

Common¶

Key Classes¶

API Reference¶

Classes¶

RBLNAutoConfig ¶

Functions¶

load(path, passed_rbln_config=None, kwargs={}, return_unused_kwargs=False) staticmethod ¶

RBLNModelConfig ¶

Functions¶

__init__(cls_name=None, create_runtimes=None, optimize_host_memory=None, device=None, device_map=None, activate_profiler=None, npu=None, tensor_parallel_size=None, optimum_rbln_version=None, _compile_cfgs=[], **kwargs) ¶

Classes¶

RBLNModel ¶

Functions¶

from_pretrained(model_id, export=False, rbln_config=None, **kwargs) classmethod ¶

from_model(model, *, rbln_config=None, **kwargs) classmethod ¶

save_pretrained(save_directory) ¶

`RBLNAutoConfig` ¶

`load(path, passed_rbln_config=None, kwargs={}, return_unused_kwargs=False)` `staticmethod` ¶

`RBLNModelConfig` ¶

`init(cls_name=None, create_runtimes=None, optimize_host_memory=None, device=None, device_map=None, activate_profiler=None, npu=None, tensor_parallel_size=None, optimum_rbln_version=None, _compile_cfgs=[], **kwargs)` ¶

`RBLNModel` ¶

`from_pretrained(model_id, export=False, rbln_config=None, **kwargs)` `classmethod` ¶

`from_model(model, *, rbln_config=None, **kwargs)` `classmethod` ¶

`save_pretrained(save_directory)` ¶