ColPali¶
The ColPali is a Vision Language Model (VLM) that uses a novel architecture and training strategy to efficiently index documents from their visual features. RBLN NPUs can accelerate ColPali model inference using Optimum RBLN.
Classes¶
RBLNColPaliForRetrieval
¶
Bases: RBLNModel
The ColPali model is a transformer for document retrieval using vision-language models.
This model inherits from [RBLNModel
]. Check the superclass documentation for the generic methods the library implements for all its models.
A class for converting and running pre-trained transformers based ColPaliForRetrieval
models on RBLN devices.
It implements the methods to convert a pre-trained ColPaliForRetrieval
model into a RBLN transformer model by:
- transferring the checkpoint weights of the original into an optimized RBLN graph,
- compiling the resulting graph using the RBLN compiler.
Configuration:
This model uses [RBLNColPaliForRetrievalConfig
] for configuration. When calling methods like from_pretrained
or from_model
,
the rbln_config
parameter should be an instance of [RBLNColPaliForRetrievalConfig
] or a dictionary conforming to its structure.
See the [RBLNColPaliForRetrievalConfig
] class for all available configuration options.
Examples:
Functions¶
from_pretrained(model_id, export=False, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
bool
|
A boolean flag to indicate whether the model should be compiled. |
False
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
from_model(model, *, rbln_config=None, **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, PathLike]
|
The directory to save the model and its configuration files. Will be created if it doesn't exist. |
required |
Classes¶
RBLNColPaliForRetrievalConfig
¶
Bases: RBLNModelConfig
Configuration class for RBLN ColPali models for document retrieval.
This class extends RBLNModelConfig with specific configurations for ColPali models, including vision tower settings and multi-sequence length support.
Example usage:
Functions¶
__init__(max_seq_lens=None, output_hidden_states=None, vision_tower=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vision_tower
|
Optional[RBLNModelConfig]
|
Configuration for the vision encoder component. |
None
|
max_seq_lens
|
Union[int, List[int]]
|
The maximum sequence lengths for the language model. This can be multiple values, and the model will be compiled for each max_seq_len, allowing selection of the most appropriate max_seq_len at inference time. |
None
|
output_hidden_states
|
Optional[bool]
|
Whether to output the hidden states of the language model. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|