Common¶

API 참조¶

Classes¶

`RBLNModel` ¶

Bases: RBLNBaseModel

Functions¶

`from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)` `classmethod` ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class.	required
`config`	`Optional[PretrainedConfig]`	The configuration object associated with the model.	`None`
`rbln_config`	`Optional[Union[RBLNModelConfig, Dict]]`	Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., `RBLNLlamaForCausalLMConfig` for Llama models). For detailed configuration options, see the specific model's configuration class documentation.	`None`
`kwargs`	`Any`	Additional keyword arguments. Arguments with the prefix `rbln_` are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.	`{}`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`RBLNModel`	A RBLN model instance ready for inference on RBLN NPU devices.

`forward(*args, return_dict=None, **kwargs)` ¶

Defines the forward pass of RBLNModel. The interface mirrors HuggingFace conventions so it can act as a drop-in replacement in many cases.

This method executes the compiled RBLN model on RBLN NPU devices while remaining fully compatible with Hugging Face Transformers and Diffusers APIs. In practice, RBLNModel can replace models built on torch.nn.Module — including transformers.PreTrainedModel implementations and Diffusers components based on diffusers.ModelMixin — enabling seamless integration into existing workflows.

Parameters:

Name	Type	Description	Default
`args`	`Any`	Variable length argument list containing model inputs. The format matches the original HuggingFace model's forward method signature (e.g., input_ids, attention_mask for transformers models, or sample, timestep for diffusers models).	`()`
`return_dict`	`Optional[bool]`	Whether to return outputs as a dictionary-like object or as a tuple. When `None`: - For transformers models: Uses `self.config.use_return_dict` (typically `True`) - For diffusers models: Defaults to `True`	`None`
`kwargs`	`Any`	Arbitrary keyword arguments containing additional model inputs and parameters, matching the original HuggingFace model's interface.	`{}`

Returns:

Type	Description
`Any`	Model outputs in the same format as the original HuggingFace model.
`Any`	If `return_dict=True`, Returns a dictionary-like object (e.g., BaseModelOutput, CausalLMOutput) with named fields such as `logits`, `hidden_states`, etc.
`Any`	If `return_dict=False`, Returns a tuple containing the raw model outputs.

Note

This method maintains the exact same interface as the original HuggingFace model's forward method
The compiled model runs on RBLN NPU hardware for accelerated inference
All HuggingFace model features (generation, attention patterns, etc.) are preserved
Can be used directly in HuggingFace pipelines, transformers.Trainer, and other workflows

`from_pretrained(model_id, export=None, rbln_config=None, **kwargs)` `classmethod` ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`Optional[bool]`	A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id.	`None`
`rbln_config`	`Optional[Union[Dict, RBLNModelConfig]]`	Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., `RBLNLlamaForCausalLMConfig` for Llama models). For detailed configuration options, see the specific model's configuration class documentation.	`None`
`kwargs`	`Any`	Additional keyword arguments. Arguments with the prefix `rbln_` are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.	`{}`

Returns:

Type	Description
`RBLNModel`	A RBLN model instance ready for inference on RBLN NPU devices.

`save_pretrained(save_directory, push_to_hub=False, **kwargs)` ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, Path]`	Directory where to save the model file.	required
`push_to_hub`	`bool`	Whether or not to push your model to the HuggingFace model hub after saving it.	`False`

Classes¶

`RBLNBaseModel` ¶

Bases: SubModulesMixin, PushToHubMixin, PreTrainedModel

Functions¶

`from_pretrained(model_id, export=None, rbln_config=None, **kwargs)` `classmethod` ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`Optional[bool]`	A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id.	`None`
`rbln_config`	`Optional[Union[Dict, RBLNModelConfig]]`	Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., `RBLNLlamaForCausalLMConfig` for Llama models). For detailed configuration options, see the specific model's configuration class documentation.	`None`
`kwargs`	`Any`	Additional keyword arguments. Arguments with the prefix `rbln_` are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.	`{}`

Returns:

Type	Description
`RBLNModel`	A RBLN model instance ready for inference on RBLN NPU devices.

`save_pretrained(save_directory, push_to_hub=False, **kwargs)` ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, Path]`	Directory where to save the model file.	required
`push_to_hub`	`bool`	Whether or not to push your model to the HuggingFace model hub after saving it.	`False`

Classes¶

`RBLNAutoConfig` ¶

Resolver and factory for RBLN model configurations.

This class selects the concrete RBLNModelConfig subclass, validates the provided data, and returns a frozen configuration object that serves as the single source of truth during export and load. It does not define the schema or control model behavior.

Functions¶

`load_from_dict(config_dict)` `staticmethod` ¶

Build a RBLNModelConfig from a plain dictionary.

The dictionary must contain cls_name, which identifies the concrete configuration class to instantiate. All other keys are forwarded to the target class initializer. This method does not mutate config_dict.

Parameters:

Name	Type	Description	Default
`config_dict`	`Dict[str, Any]`	Mapping typically created by `json.load` or `yaml.safe_load`. For example, the parsed contents of `rbln_config.json`.	required

Returns:

Name	Type	Description
`RBLNModelConfig`	`RBLNModelConfig`	A configuration instance. The specific subclass is
	`RBLNModelConfig`	selected by `config_dict["cls_name"]`.

Raises:

Type	Description
`ValueError`	If `cls_name` is missing.
`Exception`	Any error raised by the target config class during init.

Examples:

>>> data = {
...     "cls_name": "RBLNLlamaForCausalLMConfig",
...     "create_runtimes": False,
...     "tensor_parallel_size": 4
... }
>>> cfg = RBLNAutoConfig.load_from_dict(data)

`register(config, exist_ok=False)` `staticmethod` ¶

Register a new configuration for this class.

Parameters:

Name	Type	Description	Default
`config`	`RBLNModelConfig`	The config to register.	required
`exist_ok`	`bool`	Whether to allow registering an already registered model.	`False`

`load(path, passed_rbln_config=None, kwargs={}, return_unused_kwargs=False)` `staticmethod` ¶

Load RBLNModelConfig from a path. Class name is automatically inferred from the rbln_config.json file.

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to the RBLNModelConfig.	required
`passed_rbln_config`	`Optional[RBLNModelConfig]`	RBLNModelConfig to pass its runtime options.	`None`

Returns:

Name	Type	Description
`RBLNModelConfig`	`Union[RBLNModelConfig, Tuple[RBLNModelConfig, Dict[str, Any]]]`	The loaded RBLNModelConfig.

`RBLNModelConfig` ¶

Bases: RBLNSerializableConfigProtocol

Base configuration class for RBLN models that handles compilation settings, runtime options, and submodules.

This class provides functionality for:

Managing compilation configurations for RBLN devices
Configuring runtime behavior such as device placement
Handling nested configuration objects for complex model architectures
Serializing and deserializing configurations

Examples:

Using with RBLNModel.from_pretrained():

from optimum.rbln import RBLNResNetForImageClassification

# Method 1: Using rbln_ prefixed arguments (recommended for simple cases)
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,  # Compile the model
    rbln_image_size=224,
    rbln_batch_size=16,
    rbln_create_runtimes=True,
    rbln_device=0
)

# Method 2: Using a config dictionary
rbln_config_dict = {
    "image_size": 224,
    "batch_size": 16,
    "create_runtimes": True
}
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=rbln_config_dict
)

# Method 3: Using a RBLNModelConfig instance
from optimum.rbln import RBLNResNetForImageClassificationConfig

config = RBLNResNetForImageClassificationConfig(
    image_size=224,
    batch_size=16,
    create_runtimes=True
)

model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=config
)

# Method 4: Combining a config object with override parameters
# (rbln_ prefixed parameters take precedence over rbln_config values)
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=config,
    rbln_image_size=320,  # This overrides the value in config
    rbln_device=1         # This sets a new value
)

Save and load configuration:

# Save to disk
config.save("/path/to/model")

# Using AutoConfig
loaded_config = RBLNAutoConfig.load("/path/to/model")

Converting between configuration formats:

# Converting a dictionary to a config instance
config_dict = {
    "image_size": 224,
    "batch_size": 8,
    "create_runtimes": True
}
config = RBLNResNetForImageClassificationConfig(**config_dict)

Configuration for language models:

from optimum.rbln import RBLNLlamaForCausalLMConfig, RBLNCompileConfig

# Configure a LLaMA for RBLN
config = RBLNLlamaForCausalLMConfig(
    max_seq_len=4096,
    device=[0, 1, 2, 3],
    tensor_parallel_size=4  # For multi-NPU parallel inference
)

Working with models that have submodules:

from optimum.rbln import RBLNLlavaNextForConditionalGeneration

# Configuring a model with submodules
# LlavaNext has a vision_tower and a language_model submodule
model = RBLNLlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    export=True,
    rbln_config={
        # Main model's (projector, which is not a submodule) configuration
        "create_runtimes": True,
        "device": 0,

        # Submodule configurations as nested dictionaries
        "vision_tower": {
            "image_size": 336,
        },
        "language_model": {
            "tensor_parallel_size": 4,  # Distribute across 4 NPUs
            "max_seq_len": 8192,
            "use_inputs_embeds": True,
            "batch_size": 1,
        },
    },
)

Advanced multi-device deployment with tensor parallelism:

from optimum.rbln import RBLNLlamaForCausalLMConfig

# Setup a complex multi-device configuration for large language models
llm_config = RBLNLlamaForCausalLMConfig(
    # Split model across 8 NPUs
    tensor_parallel_size=8,

    # Runtime options
    device=[8, 9, 10, 11, 12, 13, 14, 15],
    create_runtimes=True,
    activate_profiler=True,  # Enable profiling for performance analysis

    # Model-specific parameters for the LLM
    max_seq_len=131072,
    batch_size=4,
    attn_impl="flash_attn",
)

Compilation without runtime creation (create_runtimes=False):

from optimum.rbln import RBLNLlamaForCausalLM, RBLNLlamaForCausalLMConfig

# Compile a model on a machine without NPU or for later use
config = RBLNLlamaForCausalLMConfig(
    create_runtimes=False,  # Compile only, don't create runtime
    npu="RBLN-CA25",  # Specify target NPU for compilation
    max_seq_len=4096,
    tensor_parallel_size=4,
    batch_size=1
)

# Export the model - will compile but not create runtimes
model = RBLNLlamaForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    export=True,
    rbln_config=config
)

# Save the compiled model for later use on NPU
model.save_pretrained("./compiled_llama_model")

# Later, on a machine with the target NPU
inference_model = RBLNLlamaForCausalLM.from_pretrained(
    "./compiled_llama_model",
    rbln_create_runtimes=True,  # Now create runtimes (Optional)
)

Two-stage workflow with separate compilation and runtime:

from optimum.rbln import RBLNResNetForImageClassification

# Stage 1: Model engineer compiles model (can be on any machine)
def compile_model():
    model = RBLNResNetForImageClassification.from_pretrained(
        "microsoft/resnet-50",
        export=True,
        rbln_create_runtimes=False,
        rbln_npu="RBLN-CA25",
        rbln_image_size=224
    )
    model.save_pretrained("./compiled_model")
    print("Model compiled and saved, ready for deployment")

# Stage 2: Deployment engineer loads model on NPU
def deploy_model():
    model = RBLNResNetForImageClassification.from_pretrained(
        "./compiled_model",
        rbln_create_runtimes=True,
    )
    print("Model loaded and ready for inference")
    return model

Functions¶

`init(cls_name=None, create_runtimes=None, device=None, device_map=None, activate_profiler=None, npu=None, tensor_parallel_size=None, timeout=None, optimum_rbln_version=None, _torch_dtype=None, _compile_cfgs=[], *, optimize_host_memory=None, **kwargs)` ¶

Initialize a RBLN model configuration with runtime options and compile configurations.

Parameters:

Name	Type	Description	Default
`cls_name`	`Optional[str]`	The class name of the configuration. Defaults to the current class name.	`None`
`create_runtimes`	`Optional[bool]`	Whether to create RBLN runtimes. Defaults to True.	`None`
`device`	`Optional[Union[int, List[int]]]`	The device(s) to load the model onto. Can be a single device ID or a list.	`None`
`device_map`	`Optional[Dict[str, Union[int, List[int]]]]`	Mapping from compiled model names to device IDs.	`None`
`activate_profiler`	`Optional[bool]`	Whether to activate the profiler for performance analysis.	`None`
`npu`	`Optional[str]`	The NPU device name to use for compilation.	`None`
`tensor_parallel_size`	`Optional[int]`	Size for tensor parallelism to distribute the model across devices.	`None`
`timeout`	`Optional[int]`	The timeout for the runtime in seconds. If it isn't provided, it will be set to 60 by default.	`None`
`optimum_rbln_version`	`Optional[str]`	The optimum-rbln version used for this configuration.	`None`
`_torch_dtype`	`Optional[str]`	The data type to use for the model.	`None`
`_compile_cfgs`	`List[RBLNCompileConfig]`	List of compilation configurations for the model.	`[]`
`kwargs`	`Any`	Additional keyword arguments.	`{}`

Raises:

Type	Description
`ValueError`	If unexpected keyword arguments are provided.

`load(path, **kwargs)` `classmethod` ¶

Load a RBLNModelConfig from a path.

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to the RBLNModelConfig file or directory containing the config file.	required
`kwargs`	`Any`	Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration.	`{}`

Returns:

Name	Type	Description
`RBLNModelConfig`	`RBLNModelConfig`	The loaded configuration instance.

Note

This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.

Common¶

API 참조¶

Classes¶

RBLNModel ¶

Functions¶

from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs) classmethod ¶

forward(*args, return_dict=None, **kwargs) ¶

from_pretrained(model_id, export=None, rbln_config=None, **kwargs) classmethod ¶

save_pretrained(save_directory, push_to_hub=False, **kwargs) ¶

Classes¶

RBLNBaseModel ¶

Functions¶

from_pretrained(model_id, export=None, rbln_config=None, **kwargs) classmethod ¶

save_pretrained(save_directory, push_to_hub=False, **kwargs) ¶

Classes¶

RBLNAutoConfig ¶

Functions¶

load_from_dict(config_dict) staticmethod ¶

register(config, exist_ok=False) staticmethod ¶

load(path, passed_rbln_config=None, kwargs={}, return_unused_kwargs=False) staticmethod ¶

RBLNModelConfig ¶

Functions¶

__init__(cls_name=None, create_runtimes=None, device=None, device_map=None, activate_profiler=None, npu=None, tensor_parallel_size=None, timeout=None, optimum_rbln_version=None, _torch_dtype=None, _compile_cfgs=[], *, optimize_host_memory=None, **kwargs) ¶

load(path, **kwargs) classmethod ¶

`RBLNModel` ¶

`from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)` `classmethod` ¶

`forward(*args, return_dict=None, **kwargs)` ¶

`from_pretrained(model_id, export=None, rbln_config=None, **kwargs)` `classmethod` ¶

`save_pretrained(save_directory, push_to_hub=False, **kwargs)` ¶

`RBLNBaseModel` ¶

`from_pretrained(model_id, export=None, rbln_config=None, **kwargs)` `classmethod` ¶

`save_pretrained(save_directory, push_to_hub=False, **kwargs)` ¶

`RBLNAutoConfig` ¶

`load_from_dict(config_dict)` `staticmethod` ¶

`register(config, exist_ok=False)` `staticmethod` ¶

`load(path, passed_rbln_config=None, kwargs={}, return_unused_kwargs=False)` `staticmethod` ¶

`RBLNModelConfig` ¶

`init(cls_name=None, create_runtimes=None, device=None, device_map=None, activate_profiler=None, npu=None, tensor_parallel_size=None, timeout=None, optimum_rbln_version=None, _torch_dtype=None, _compile_cfgs=[], *, optimize_host_memory=None, **kwargs)` ¶

`load(path, **kwargs)` `classmethod` ¶