CLIP¶
CLIP (Contrastive Language-Image Pre-training) is a multimodal model developed by OpenAI that connects text and images. It's trained on a diverse dataset of image-text pairs from the internet to understand both visual and textual information. CLIP can perform zero-shot classification and has strong image-text matching capabilities. RBLN NPUs can accelerate CLIP model inference using Optimum RBLN.
Key Classes¶
RBLNCLIPTextModel
: CLIP text model implementation for text encoding on RBLN NPURBLNCLIPTextModelConfig
: Configuration class for CLIP text modelRBLNCLIPTextModelWithProjection
: CLIP text model with projection for multimodal tasksRBLNCLIPTextModelWithProjectionConfig
: Configuration class for CLIP text model with projectionRBLNCLIPVisionModel
: CLIP vision model implementation for image encoding on RBLN NPURBLNCLIPVisionModelConfig
: Configuration class for CLIP vision modelRBLNCLIPVisionModelWithProjection
: CLIP vision model with projection for multimodal tasksRBLNCLIPVisionModelWithProjectionConfig
: Configuration class for CLIP vision model with projection
API Reference¶
Classes¶
RBLNCLIPTextModel
¶
Bases: RBLNModel
Functions¶
forward(input_ids, return_dict=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids
|
LongTensor
|
Input IDs. |
required |
return_dict
|
bool
|
Whether or not to return a ModelOutput instead of a plain tuple. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments. |
{}
|
Returns:
Type | Description |
---|---|
Union[CLIPTextModelOutput, Tuple]
|
Union[CLIPTextModelOutput, Tuple]: The output of the model. |
from_pretrained(model_id, export=False, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
bool
|
A boolean flag to indicate whether the model should be compiled. |
False
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
from_model(model, *, rbln_config=None, **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, PathLike]
|
The directory to save the model and its configuration files. Will be created if it doesn't exist. |
required |
RBLNCLIPTextModelWithProjection
¶
Bases: RBLNCLIPTextModel
Functions¶
from_pretrained(model_id, export=False, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
bool
|
A boolean flag to indicate whether the model should be compiled. |
False
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
from_model(model, *, rbln_config=None, **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, PathLike]
|
The directory to save the model and its configuration files. Will be created if it doesn't exist. |
required |
forward(input_ids, return_dict=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids
|
LongTensor
|
Input IDs. |
required |
return_dict
|
bool
|
Whether or not to return a ModelOutput instead of a plain tuple. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments. |
{}
|
Returns:
Type | Description |
---|---|
Union[CLIPTextModelOutput, Tuple]
|
Union[CLIPTextModelOutput, Tuple]: The output of the model. |
RBLNCLIPVisionModel
¶
Bases: RBLNModel
Functions¶
forward(pixel_values=None, return_dict=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pixel_values
|
Optional[FloatTensor]
|
Input pixel values. |
None
|
return_dict
|
bool
|
Whether or not to return a ModelOutput instead of a plain tuple. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments. |
{}
|
Returns:
Type | Description |
---|---|
Union[CLIPVisionModelOutput, Tuple]
|
Union[CLIPVisionModelOutput, Tuple]: The output of the model. |
from_pretrained(model_id, export=False, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
bool
|
A boolean flag to indicate whether the model should be compiled. |
False
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
from_model(model, *, rbln_config=None, **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, PathLike]
|
The directory to save the model and its configuration files. Will be created if it doesn't exist. |
required |
RBLNCLIPVisionModelWithProjection
¶
Bases: RBLNCLIPVisionModel
Functions¶
from_pretrained(model_id, export=False, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
bool
|
A boolean flag to indicate whether the model should be compiled. |
False
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
from_model(model, *, rbln_config=None, **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, PathLike]
|
The directory to save the model and its configuration files. Will be created if it doesn't exist. |
required |
forward(pixel_values=None, return_dict=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pixel_values
|
Optional[FloatTensor]
|
Input pixel values. |
None
|
return_dict
|
bool
|
Whether or not to return a ModelOutput instead of a plain tuple. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments. |
{}
|
Returns:
Type | Description |
---|---|
Union[CLIPVisionModelOutput, Tuple]
|
Union[CLIPVisionModelOutput, Tuple]: The output of the model. |
Classes¶
RBLNCLIPTextModelConfig
¶
Bases: RBLNModelConfig
Functions¶
__init__(batch_size=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for text processing. Defaults to 1. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |
RBLNCLIPTextModelWithProjectionConfig
¶
Bases: RBLNCLIPTextModelConfig
Functions¶
__init__(batch_size=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for text processing. Defaults to 1. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |
RBLNCLIPVisionModelConfig
¶
Bases: RBLNModelConfig
Functions¶
__init__(batch_size=None, image_size=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for image processing. Defaults to 1. |
None
|
image_size
|
Optional[int]
|
The size of input images. Can be an integer for square images, a tuple/list (height, width), or a dictionary with 'height' and 'width' keys. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |
RBLNCLIPVisionModelWithProjectionConfig
¶
Bases: RBLNCLIPVisionModelConfig
Functions¶
__init__(batch_size=None, image_size=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for image processing. Defaults to 1. |
None
|
image_size
|
Optional[int]
|
The size of input images. Can be an integer for square images, a tuple/list (height, width), or a dictionary with 'height' and 'width' keys. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |