콘텐츠로 이동

Common

API 참조

Classes

RBLNModel

Bases: RBLNBaseModel

Functions

from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs) classmethod

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name Type Description Default
model PreTrainedModel

The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class.

required
config Optional[PretrainedConfig]

The configuration object associated with the model.

None
rbln_config Optional[Union[RBLNModelConfig, Dict]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Any

Additional keyword arguments. Arguments with the prefix rbln_ are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

The method performs the following steps:

  1. Compiles the PyTorch model into an optimized RBLN graph
  2. Configures the model for the specified NPU device
  3. Creates the necessary runtime objects if requested
  4. Saves the compiled model and configurations

Returns:

Type Description
RBLNModel

A RBLN model instance ready for inference on RBLN NPU devices.

forward(*args, return_dict=None, **kwargs)

Defines the forward pass of RBLNModel. The interface mirrors HuggingFace conventions so it can act as a drop-in replacement in many cases.

This method executes the compiled RBLN model on RBLN NPU devices while remaining fully compatible with Hugging Face Transformers and Diffusers APIs. In practice, RBLNModel can replace models built on torch.nn.Module — including transformers.PreTrainedModel implementations and Diffusers components based on diffusers.ModelMixin — enabling seamless integration into existing workflows.

Parameters:

Name Type Description Default
args Any

Variable length argument list containing model inputs. The format matches the original HuggingFace model's forward method signature (e.g., input_ids, attention_mask for transformers models, or sample, timestep for diffusers models).

()
return_dict Optional[bool]

Whether to return outputs as a dictionary-like object or as a tuple. When None: - For transformers models: Uses self.config.use_return_dict (typically True) - For diffusers models: Defaults to True

None
kwargs Any

Arbitrary keyword arguments containing additional model inputs and parameters, matching the original HuggingFace model's interface.

{}

Returns:

Type Description
Any

Model outputs in the same format as the original HuggingFace model.

Any

If return_dict=True, Returns a dictionary-like object (e.g., BaseModelOutput, CausalLMOutput) with named fields such as logits, hidden_states, etc.

Any

If return_dict=False, Returns a tuple containing the raw model outputs.

Note
  • This method maintains the exact same interface as the original HuggingFace model's forward method
  • The compiled model runs on RBLN NPU hardware for accelerated inference
  • All HuggingFace model features (generation, attention patterns, etc.) are preserved
  • Can be used directly in HuggingFace pipelines, transformers.Trainer, and other workflows
from_pretrained(model_id, export=None, rbln_config=None, **kwargs) classmethod

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.

Parameters:

Name Type Description Default
model_id Union[str, Path]

The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.

required
export Optional[bool]

A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id.

None
rbln_config Optional[Union[Dict, RBLNModelConfig]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Any

Additional keyword arguments. Arguments with the prefix rbln_ are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

Returns:

Type Description
RBLNModel

A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory, push_to_hub=False, **kwargs)

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained] class method.

Parameters:

Name Type Description Default
save_directory Union[str, Path]

Directory where to save the model file.

required
push_to_hub bool

Whether or not to push your model to the HuggingFace model hub after saving it.

False

Classes

RBLNBaseModel

Bases: SubModulesMixin, PushToHubMixin, PreTrainedModel

Functions

from_pretrained(model_id, export=None, rbln_config=None, **kwargs) classmethod

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.

Parameters:

Name Type Description Default
model_id Union[str, Path]

The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.

required
export Optional[bool]

A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id.

None
rbln_config Optional[Union[Dict, RBLNModelConfig]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Any

Additional keyword arguments. Arguments with the prefix rbln_ are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

Returns:

Type Description
RBLNModel

A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory, push_to_hub=False, **kwargs)

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained] class method.

Parameters:

Name Type Description Default
save_directory Union[str, Path]

Directory where to save the model file.

required
push_to_hub bool

Whether or not to push your model to the HuggingFace model hub after saving it.

False

Classes

RBLNAutoConfig

Resolver and factory for RBLN model configurations.

This class selects the concrete RBLNModelConfig subclass, validates the provided data, and returns a frozen configuration object that serves as the single source of truth during export and load. It does not define the schema or control model behavior.

Functions

load_from_dict(config_dict) staticmethod

Build a RBLNModelConfig from a plain dictionary.

The dictionary must contain cls_name, which identifies the concrete configuration class to instantiate. All other keys are forwarded to the target class initializer. This method does not mutate config_dict.

Parameters:

Name Type Description Default
config_dict Dict[str, Any]

Mapping typically created by json.load or yaml.safe_load. For example, the parsed contents of rbln_config.json.

required

Returns:

Name Type Description
RBLNModelConfig RBLNModelConfig

A configuration instance. The specific subclass is

RBLNModelConfig

selected by config_dict["cls_name"].

Raises:

Type Description
ValueError

If cls_name is missing.

Exception

Any error raised by the target config class during init.

Examples:

1
2
3
4
5
6
>>> data = {
...     "cls_name": "RBLNLlamaForCausalLMConfig",
...     "create_runtimes": False,
...     "tensor_parallel_size": 4
... }
>>> cfg = RBLNAutoConfig.load_from_dict(data)
register(config, exist_ok=False) staticmethod

Register a new configuration for this class.

Parameters:

Name Type Description Default
config RBLNModelConfig

The config to register.

required
exist_ok bool

Whether to allow registering an already registered model.

False
load(path, passed_rbln_config=None, kwargs={}, return_unused_kwargs=False) staticmethod

Load RBLNModelConfig from a path. Class name is automatically inferred from the rbln_config.json file.

Parameters:

Name Type Description Default
path str

Path to the RBLNModelConfig.

required
passed_rbln_config Optional[RBLNModelConfig]

RBLNModelConfig to pass its runtime options.

None

Returns:

Name Type Description
RBLNModelConfig Union[RBLNModelConfig, Tuple[RBLNModelConfig, Dict[str, Any]]]

The loaded RBLNModelConfig.

RBLNModelConfig

Bases: RBLNSerializableConfigProtocol

Base configuration class for RBLN models that handles compilation settings, runtime options, and submodules.

This class provides functionality for:

  1. Managing compilation configurations for RBLN devices
  2. Configuring runtime behavior such as device placement
  3. Handling nested configuration objects for complex model architectures
  4. Serializing and deserializing configurations

Examples:

Using with RBLNModel.from_pretrained():

from optimum.rbln import RBLNResNetForImageClassification

# Method 1: Using rbln_ prefixed arguments (recommended for simple cases)
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,  # Compile the model
    rbln_image_size=224,
    rbln_batch_size=16,
    rbln_create_runtimes=True,
    rbln_device=0
)

# Method 2: Using a config dictionary
rbln_config_dict = {
    "image_size": 224,
    "batch_size": 16,
    "create_runtimes": True
}
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=rbln_config_dict
)

# Method 3: Using a RBLNModelConfig instance
from optimum.rbln import RBLNResNetForImageClassificationConfig

config = RBLNResNetForImageClassificationConfig(
    image_size=224,
    batch_size=16,
    create_runtimes=True
)

model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=config
)

# Method 4: Combining a config object with override parameters
# (rbln_ prefixed parameters take precedence over rbln_config values)
model = RBLNResNetForImageClassification.from_pretrained(
    "model_id",
    export=True,
    rbln_config=config,
    rbln_image_size=320,  # This overrides the value in config
    rbln_device=1         # This sets a new value
)

Save and load configuration:

1
2
3
4
5
# Save to disk
config.save("/path/to/model")

# Using AutoConfig
loaded_config = RBLNAutoConfig.load("/path/to/model")

Converting between configuration formats:

1
2
3
4
5
6
7
# Converting a dictionary to a config instance
config_dict = {
    "image_size": 224,
    "batch_size": 8,
    "create_runtimes": True
}
config = RBLNResNetForImageClassificationConfig(**config_dict)

Configuration for language models:

1
2
3
4
5
6
7
8
from optimum.rbln import RBLNLlamaForCausalLMConfig, RBLNCompileConfig

# Configure a LLaMA for RBLN
config = RBLNLlamaForCausalLMConfig(
    max_seq_len=4096,
    device=[0, 1, 2, 3],
    tensor_parallel_size=4  # For multi-NPU parallel inference
)

Working with models that have submodules:

from optimum.rbln import RBLNLlavaNextForConditionalGeneration

# Configuring a model with submodules
# LlavaNext has a vision_tower and a language_model submodule
model = RBLNLlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    export=True,
    rbln_config={
        # Main model's (projector, which is not a submodule) configuration
        "create_runtimes": True,
        "device": 0,

        # Submodule configurations as nested dictionaries
        "vision_tower": {
            "image_size": 336,
        },
        "language_model": {
            "tensor_parallel_size": 4,  # Distribute across 4 NPUs
            "max_seq_len": 8192,
            "use_inputs_embeds": True,
            "batch_size": 1,
        },
    },
)

Advanced multi-device deployment with tensor parallelism:

from optimum.rbln import RBLNLlamaForCausalLMConfig

# Setup a complex multi-device configuration for large language models
llm_config = RBLNLlamaForCausalLMConfig(
    # Split model across 8 NPUs
    tensor_parallel_size=8,

    # Runtime options
    device=[8, 9, 10, 11, 12, 13, 14, 15],
    create_runtimes=True,
    activate_profiler=True,  # Enable profiling for performance analysis

    # Model-specific parameters for the LLM
    max_seq_len=131072,
    batch_size=4,
    attn_impl="flash_attn",
)

Compilation without runtime creation (create_runtimes=False):

from optimum.rbln import RBLNLlamaForCausalLM, RBLNLlamaForCausalLMConfig

# Compile a model on a machine without NPU or for later use
config = RBLNLlamaForCausalLMConfig(
    create_runtimes=False,  # Compile only, don't create runtime
    npu="RBLN-CA25",  # Specify target NPU for compilation
    max_seq_len=4096,
    tensor_parallel_size=4,
    batch_size=1
)

# Export the model - will compile but not create runtimes
model = RBLNLlamaForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    export=True,
    rbln_config=config
)

# Save the compiled model for later use on NPU
model.save_pretrained("./compiled_llama_model")

# Later, on a machine with the target NPU
inference_model = RBLNLlamaForCausalLM.from_pretrained(
    "./compiled_llama_model",
    rbln_create_runtimes=True,  # Now create runtimes (Optional)
)

Two-stage workflow with separate compilation and runtime:

from optimum.rbln import RBLNResNetForImageClassification

# Stage 1: Model engineer compiles model (can be on any machine)
def compile_model():
    model = RBLNResNetForImageClassification.from_pretrained(
        "microsoft/resnet-50",
        export=True,
        rbln_create_runtimes=False,
        rbln_npu="RBLN-CA25",
        rbln_image_size=224
    )
    model.save_pretrained("./compiled_model")
    print("Model compiled and saved, ready for deployment")

# Stage 2: Deployment engineer loads model on NPU
def deploy_model():
    model = RBLNResNetForImageClassification.from_pretrained(
        "./compiled_model",
        rbln_create_runtimes=True,
    )
    print("Model loaded and ready for inference")
    return model

Functions

__init__(cls_name=None, create_runtimes=None, optimize_host_memory=None, device=None, device_map=None, activate_profiler=None, npu=None, tensor_parallel_size=None, timeout=None, optimum_rbln_version=None, _torch_dtype=None, _compile_cfgs=[], **kwargs)

Initialize a RBLN model configuration with runtime options and compile configurations.

Parameters:

Name Type Description Default
cls_name Optional[str]

The class name of the configuration. Defaults to the current class name.

None
create_runtimes Optional[bool]

Whether to create RBLN runtimes. Defaults to True.

None
optimize_host_memory Optional[bool]

Whether to optimize host memory usage. Defaults to True.

None
device Optional[Union[int, List[int]]]

The device(s) to load the model onto. Can be a single device ID or a list.

None
device_map Optional[Dict[str, Union[int, List[int]]]]

Mapping from compiled model names to device IDs.

None
activate_profiler Optional[bool]

Whether to activate the profiler for performance analysis.

None
npu Optional[str]

The NPU device name to use for compilation.

None
tensor_parallel_size Optional[int]

Size for tensor parallelism to distribute the model across devices.

None
timeout Optional[int]

The timeout for the runtime in seconds. If it isn't provided, it will be set to 60 by default.

None
optimum_rbln_version Optional[str]

The optimum-rbln version used for this configuration.

None
_torch_dtype Optional[str]

The data type to use for the model.

None
_compile_cfgs List[RBLNCompileConfig]

List of compilation configurations for the model.

[]
kwargs Any

Additional keyword arguments.

{}

Raises:

Type Description
ValueError

If unexpected keyword arguments are provided.

load(path, **kwargs) classmethod

Load a RBLNModelConfig from a path.

Parameters:

Name Type Description Default
path str

Path to the RBLNModelConfig file or directory containing the config file.

required
kwargs Any

Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration.

{}

Returns:

Name Type Description
RBLNModelConfig RBLNModelConfig

The loaded configuration instance.

Note

This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.