콘텐츠로 이동

BLIP-2

BLIP-2 모델은 이미지와 텍스트 입력을 모두 처리할 수 있는 멀티모달 모델입니다. 주로 시각적 질의응답(Visual Question Answering)이나 이미지 캡셔닝과 같은 작업에 사용됩니다. RBLN NPU는 Optimum RBLN을 사용하여 BLIP-2 모델 추론을 가속화할 수 있습니다.

API Reference

Classes

RBLNBlip2VisionModel

Bases: RBLNModel

RBLN optimized BLIP-2 vision encoder model.

This class provides hardware-accelerated inference for BLIP-2 vision encoders on RBLN devices, supporting image encoding for multimodal vision-language tasks.

Functions

from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs) classmethod

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name Type Description Default
model PreTrainedModel

The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class.

required
config Optional[PretrainedConfig]

The configuration object associated with the model.

None
rbln_config Optional[Union[RBLNModelConfig, Dict]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Any

Additional keyword arguments. Arguments with the prefix rbln_ are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

The method performs the following steps:

  1. Compiles the PyTorch model into an optimized RBLN graph
  2. Configures the model for the specified NPU device
  3. Creates the necessary runtime objects if requested
  4. Saves the compiled model and configurations

Returns:

Type Description
RBLNModel

A RBLN model instance ready for inference on RBLN NPU devices.

from_pretrained(model_id, export=None, rbln_config=None, **kwargs) classmethod

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.

Parameters:

Name Type Description Default
model_id Union[str, Path]

The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.

required
export Optional[bool]

A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id.

None
rbln_config Optional[Union[Dict, RBLNModelConfig]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Any

Additional keyword arguments. Arguments with the prefix rbln_ are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

Returns:

Type Description
RBLNModel

A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory, push_to_hub=False, **kwargs)

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained] class method.

Parameters:

Name Type Description Default
save_directory Union[str, Path]

Directory where to save the model file.

required
push_to_hub bool

Whether or not to push your model to the HuggingFace model hub after saving it.

False

RBLNBlip2QFormerModel

Bases: RBLNModel

RBLN optimized BLIP-2 Q-Former model.

This class provides hardware-accelerated inference for BLIP-2 Q-Former models on RBLN devices, which bridge vision and language modalities through cross-attention mechanisms for multimodal understanding tasks.

Functions

from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs) classmethod

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name Type Description Default
model PreTrainedModel

The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class.

required
config Optional[PretrainedConfig]

The configuration object associated with the model.

None
rbln_config Optional[Union[RBLNModelConfig, Dict]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Any

Additional keyword arguments. Arguments with the prefix rbln_ are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

The method performs the following steps:

  1. Compiles the PyTorch model into an optimized RBLN graph
  2. Configures the model for the specified NPU device
  3. Creates the necessary runtime objects if requested
  4. Saves the compiled model and configurations

Returns:

Type Description
RBLNModel

A RBLN model instance ready for inference on RBLN NPU devices.

from_pretrained(model_id, export=None, rbln_config=None, **kwargs) classmethod

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.

Parameters:

Name Type Description Default
model_id Union[str, Path]

The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.

required
export Optional[bool]

A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id.

None
rbln_config Optional[Union[Dict, RBLNModelConfig]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Any

Additional keyword arguments. Arguments with the prefix rbln_ are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

Returns:

Type Description
RBLNModel

A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory, push_to_hub=False, **kwargs)

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained] class method.

Parameters:

Name Type Description Default
save_directory Union[str, Path]

Directory where to save the model file.

required
push_to_hub bool

Whether or not to push your model to the HuggingFace model hub after saving it.

False

RBLNBlip2ForConditionalGeneration

Bases: RBLNModel

RBLNBlip2ForConditionalGeneration is a multi-modal model that integrates vision and language processing capabilities, optimized for RBLN NPUs. It is designed for conditional generation tasks that involve both image and text inputs.

This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models.

Important Note

This model includes a Large Language Model (LLM) as a submodule. For optimal performance, it is highly recommended to use tensor parallelism for the language model. This can be achieved by using the rbln_config parameter in the from_pretrained method. Refer to the from_pretrained documentation and the RBLNBlip2ForConditionalGeneration class for details.

Examples:

from optimum.rbln import RBLNBlip2ForConditionalGeneration

model = RBLNBlip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-opt-2.7b",
    export=True,
    rbln_config={
        "language_model": {
            "batch_size": 1,
            "max_seq_len": 2048,
            "tensor_parallel_size": 1,
            "use_inputs_embeds": True,
        },
    },
)

model.save_pretrained("compiled-blip2-opt-2.7b")

Functions

from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs) classmethod

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name Type Description Default
model PreTrainedModel

The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class.

required
config Optional[PretrainedConfig]

The configuration object associated with the model.

None
rbln_config Optional[Union[RBLNModelConfig, Dict]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Any

Additional keyword arguments. Arguments with the prefix rbln_ are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

The method performs the following steps:

  1. Compiles the PyTorch model into an optimized RBLN graph
  2. Configures the model for the specified NPU device
  3. Creates the necessary runtime objects if requested
  4. Saves the compiled model and configurations

Returns:

Type Description
RBLNModel

A RBLN model instance ready for inference on RBLN NPU devices.

from_pretrained(model_id, export=None, rbln_config=None, **kwargs) classmethod

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.

Parameters:

Name Type Description Default
model_id Union[str, Path]

The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.

required
export Optional[bool]

A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id.

None
rbln_config Optional[Union[Dict, RBLNModelConfig]]

Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., RBLNLlamaForCausalLMConfig for Llama models). For detailed configuration options, see the specific model's configuration class documentation.

None
kwargs Any

Additional keyword arguments. Arguments with the prefix rbln_ are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library.

{}

Returns:

Type Description
RBLNModel

A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory, push_to_hub=False, **kwargs)

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained] class method.

Parameters:

Name Type Description Default
save_directory Union[str, Path]

Directory where to save the model file.

required
push_to_hub bool

Whether or not to push your model to the HuggingFace model hub after saving it.

False
forward(*args, return_dict=None, **kwargs)

Defines the forward pass of RBLNModel. The interface mirrors HuggingFace conventions so it can act as a drop-in replacement in many cases.

This method executes the compiled RBLN model on RBLN NPU devices while remaining fully compatible with Hugging Face Transformers and Diffusers APIs. In practice, RBLNModel can replace models built on torch.nn.Module — including transformers.PreTrainedModel implementations and Diffusers components based on diffusers.ModelMixin — enabling seamless integration into existing workflows.

Parameters:

Name Type Description Default
args Any

Variable length argument list containing model inputs. The format matches the original HuggingFace model's forward method signature (e.g., input_ids, attention_mask for transformers models, or sample, timestep for diffusers models).

()
return_dict Optional[bool]

Whether to return outputs as a dictionary-like object or as a tuple. When None: - For transformers models: Uses self.config.use_return_dict (typically True) - For diffusers models: Defaults to True

None
kwargs Any

Arbitrary keyword arguments containing additional model inputs and parameters, matching the original HuggingFace model's interface.

{}

Returns:

Type Description
Any

Model outputs in the same format as the original HuggingFace model.

Any

If return_dict=True, Returns a dictionary-like object (e.g., BaseModelOutput, CausalLMOutput) with named fields such as logits, hidden_states, etc.

Any

If return_dict=False, Returns a tuple containing the raw model outputs.

Note
  • This method maintains the exact same interface as the original HuggingFace model's forward method
  • The compiled model runs on RBLN NPU hardware for accelerated inference
  • All HuggingFace model features (generation, attention patterns, etc.) are preserved
  • Can be used directly in HuggingFace pipelines, transformers.Trainer, and other workflows

Classes

RBLNBlip2VisionModelConfig

Bases: RBLNModelConfig

Configuration class for RBLNBlip2VisionModel.

This configuration class stores the configuration parameters specific to RBLN-optimized BLIP-2 vision encoder models for multimodal tasks.

Functions

__init__(cls_name=None, create_runtimes=None, optimize_host_memory=None, device=None, device_map=None, activate_profiler=None, npu=None, tensor_parallel_size=None, timeout=None, optimum_rbln_version=None, _torch_dtype=None, _compile_cfgs=[], **kwargs)

Initialize a RBLN model configuration with runtime options and compile configurations.

Parameters:

Name Type Description Default
cls_name Optional[str]

The class name of the configuration. Defaults to the current class name.

None
create_runtimes Optional[bool]

Whether to create RBLN runtimes. Defaults to True.

None
optimize_host_memory Optional[bool]

Whether to optimize host memory usage. Defaults to True.

None
device Optional[Union[int, List[int]]]

The device(s) to load the model onto. Can be a single device ID or a list.

None
device_map Optional[Dict[str, Union[int, List[int]]]]

Mapping from compiled model names to device IDs.

None
activate_profiler Optional[bool]

Whether to activate the profiler for performance analysis.

None
npu Optional[str]

The NPU device name to use for compilation.

None
tensor_parallel_size Optional[int]

Size for tensor parallelism to distribute the model across devices.

None
timeout Optional[int]

The timeout for the runtime in seconds. If it isn't provided, it will be set to 60 by default.

None
optimum_rbln_version Optional[str]

The optimum-rbln version used for this configuration.

None
_torch_dtype Optional[str]

The data type to use for the model.

None
_compile_cfgs List[RBLNCompileConfig]

List of compilation configurations for the model.

[]
kwargs Any

Additional keyword arguments.

{}

Raises:

Type Description
ValueError

If unexpected keyword arguments are provided.

load(path, **kwargs) classmethod

Load a RBLNModelConfig from a path.

Parameters:

Name Type Description Default
path str

Path to the RBLNModelConfig file or directory containing the config file.

required
kwargs Any

Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration.

{}

Returns:

Name Type Description
RBLNModelConfig RBLNModelConfig

The loaded configuration instance.

Note

This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.

RBLNBlip2QFormerModelConfig

Bases: RBLNModelConfig

Configuration class for RBLNBlip2QFormerModel.

This configuration class stores the configuration parameters specific to RBLN-optimized BLIP-2 Q-Former models that bridge vision and language modalities.

Functions

__init__(num_query_tokens=None, image_text_hidden_size=None, **kwargs)

Parameters:

Name Type Description Default
num_query_tokens Optional[int]

The number of query tokens passed through the Transformer.

None
image_text_hidden_size Optional[int]

Dimensionality of the hidden state of the image-text fusion layer.

None
kwargs

Additional arguments passed to the parent RBLNModelConfig.

{}
load(path, **kwargs) classmethod

Load a RBLNModelConfig from a path.

Parameters:

Name Type Description Default
path str

Path to the RBLNModelConfig file or directory containing the config file.

required
kwargs Any

Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration.

{}

Returns:

Name Type Description
RBLNModelConfig RBLNModelConfig

The loaded configuration instance.

Note

This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.

RBLNBlip2ForConditionalGenerationConfig

Bases: RBLNModelConfig

Functions

__init__(batch_size=None, vision_model=None, qformer=None, language_model=None, **kwargs)

Parameters:

Name Type Description Default
batch_size Optional[int]

The batch size for inference. Defaults to 1.

None
vision_model Optional[RBLNModelConfig]

Configuration for the vision encoder component.

None
qformer Optional[RBLNModelConfig]

Configuration for the RBLN-optimized BLIP-2 Q-Former model.

None
language_model Optional[RBLNModelConfig]

Configuration for the language model component.

None
kwargs Any

Additional arguments passed to the parent RBLNModelConfig.

{}

Raises:

Type Description
ValueError

If batch_size is not a positive integer.

load(path, **kwargs) classmethod

Load a RBLNModelConfig from a path.

Parameters:

Name Type Description Default
path str

Path to the RBLNModelConfig file or directory containing the config file.

required
kwargs Any

Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration.

{}

Returns:

Name Type Description
RBLNModelConfig RBLNModelConfig

The loaded configuration instance.

Note

This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.