CLIP¶
CLIP (Contrastive Language-Image Pre-training)은 OpenAI에서 개발한 텍스트와 이미지를 연결하는 멀티모달 모델입니다. 시각적 정보와 텍스트 정보를 모두 이해할 수 있도록 인터넷에서 수집한 다양한 이미지-텍스트 쌍 데이터셋에서 훈련되었습니다. CLIP은 제로샷 분류를 수행할 수 있으며 강력한 이미지-텍스트 매칭 기능을 갖추고 있습니다. RBLN NPU는 Optimum RBLN을 사용하여 CLIP 모델 추론을 가속화할 수 있습니다.
API 참조¶
Classes¶
RBLNCLIPTextModel
¶
Bases: RBLNModel
RBLN optimized CLIP text encoder model.
This class provides hardware-accelerated inference for CLIP text encoders on RBLN devices, supporting text encoding for multimodal tasks.
Functions¶
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
forward(input_ids, return_dict=None, **kwargs)
¶
Forward pass for the RBLN-optimized CLIP text encoder model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids
|
LongTensor
|
The input ids to the model. |
required |
return_dict
|
Optional[bool]
|
Whether to return a dictionary of outputs. |
None
|
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
RBLNCLIPTextModelWithProjection
¶
Bases: RBLNCLIPTextModel
RBLN optimized CLIP text encoder model with projection layer.
This class extends RBLNCLIPTextModel with a projection layer for multimodal embedding alignment tasks.
Functions¶
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
forward(input_ids, return_dict=None, **kwargs)
¶
Forward pass for the RBLN-optimized CLIP text encoder model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids
|
LongTensor
|
The input ids to the model. |
required |
return_dict
|
Optional[bool]
|
Whether to return a dictionary of outputs. |
None
|
RBLNCLIPVisionModel
¶
Bases: RBLNModel
RBLN optimized CLIP vision encoder model.
This class provides hardware-accelerated inference for CLIP vision encoders on RBLN devices, supporting image encoding for multimodal tasks.
Functions¶
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
forward(pixel_values, return_dict=True, output_attentions=None, output_hidden_states=None, interpolate_pos_encoding=False, **kwargs)
¶
Forward pass for the RBLN-optimized CLIP vision encoder model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pixel_values
|
Tensor
|
The pixel values to the model. |
required |
return_dict
|
bool
|
Whether to return a dictionary of outputs. |
True
|
output_attentions
|
Optional[bool]
|
Whether to return attentions. |
None
|
output_hidden_states
|
Optional[bool]
|
Whether to return hidden states. |
None
|
interpolate_pos_encoding
|
bool
|
Whether to interpolate position encoding. |
False
|
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
RBLNCLIPVisionModelWithProjection
¶
Bases: RBLNCLIPVisionModel
RBLN optimized CLIP vision encoder model with projection layer.
This class extends RBLNCLIPVisionModel with a projection layer for multimodal embedding alignment tasks.
Functions¶
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
forward(pixel_values, return_dict=True, output_attentions=None, output_hidden_states=None, interpolate_pos_encoding=False, **kwargs)
¶
Forward pass for the RBLN-optimized CLIP vision encoder model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pixel_values
|
Tensor
|
The pixel values to the model. |
required |
return_dict
|
bool
|
Whether to return a dictionary of outputs. |
True
|
output_attentions
|
Optional[bool]
|
Whether to return attentions. |
None
|
output_hidden_states
|
Optional[bool]
|
Whether to return hidden states. |
None
|
interpolate_pos_encoding
|
bool
|
Whether to interpolate position encoding. |
False
|
Functions¶
Classes¶
RBLNCLIPTextModelConfig
¶
Bases: RBLNModelConfig
Functions¶
__init__(batch_size=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for text processing. Defaults to 1. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNCLIPTextModelWithProjectionConfig
¶
Bases: RBLNCLIPTextModelConfig
Configuration class for RBLNCLIPTextModelWithProjection.
This configuration inherits from RBLNCLIPTextModelConfig and stores configuration parameters for CLIP text models with projection layers.
Functions¶
__init__(batch_size=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for text processing. Defaults to 1. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNCLIPVisionModelConfig
¶
Bases: RBLNModelConfig
Functions¶
__init__(batch_size=None, image_size=None, interpolate_pos_encoding=None, output_hidden_states=None, output_attentions=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for image processing. Defaults to 1. |
None
|
image_size
|
Optional[int]
|
The size of input images. Can be an integer for square images, a tuple/list (height, width), or a dictionary with 'height' and 'width' keys. |
None
|
interpolate_pos_encoding
|
Optional[bool]
|
Whether or not to interpolate pre-trained position encodings. Defaults to |
None
|
output_hidden_states
|
Optional[bool]
|
Whether or not to return the hidden states of all layers. |
None
|
output_attentions
|
Optional[bool]
|
Whether or not to return the attentions tensors of all attention layers |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNCLIPVisionModelWithProjectionConfig
¶
Bases: RBLNCLIPVisionModelConfig
Configuration class for RBLNCLIPVisionModelWithProjection.
This configuration inherits from RBLNCLIPVisionModelConfig and stores configuration parameters for CLIP vision models with projection layers.
Functions¶
__init__(batch_size=None, image_size=None, interpolate_pos_encoding=None, output_hidden_states=None, output_attentions=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for image processing. Defaults to 1. |
None
|
image_size
|
Optional[int]
|
The size of input images. Can be an integer for square images, a tuple/list (height, width), or a dictionary with 'height' and 'width' keys. |
None
|
interpolate_pos_encoding
|
Optional[bool]
|
Whether or not to interpolate pre-trained position encodings. Defaults to |
None
|
output_hidden_states
|
Optional[bool]
|
Whether or not to return the hidden states of all layers. |
None
|
output_attentions
|
Optional[bool]
|
Whether or not to return the attentions tensors of all attention layers |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.