Common¶
API 참조¶
Classes¶
RBLNDecoderOnlyModelForCausalLM
¶
Bases: RBLNDecoderOnlyModel
, RBLNDecoderOnlyGenerationMixin
A base class for decoder-only transformer models optimized for causal language modeling tasks on RBLN devices. This class serves as the foundation for various decoder-only architectures like GPT, LLaMA, etc.
The class provides core functionality for:
- Converting pre-trained transformer models to RBLN-optimized format
- Handling the compilation process for RBLN devices
- Managing inference operations for causal language modeling
This class inherits from RBLNModel and implements specific methods required for decoder-only architectures and causal language modeling tasks.
Note
- This class is designed to be subclassed by specific model implementations (e.g., RBLNLlamaForCausalLM, RBLNGPT2LMHeadModel)
- Subclasses should implement model-specific conversion logic.
- The class handles RBLN-specific optimizations automatically during compilation
Functions¶
generate(input_ids, attention_mask=None, max_length=None, **kwargs)
¶
The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids
|
LongTensor
|
The input ids to the model. |
required |
attention_mask
|
Optional[LongTensor]
|
The attention mask to the model. |
None
|
max_length
|
Optional[int]
|
The maximum length of the generated text. |
None
|
kwargs
|
Additional arguments passed to the generate function. See the HuggingFace transformers documentation for more details. |
{}
|
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
Classes¶
RBLNDecoderOnlyModelForCausalLMConfig
¶
Bases: RBLNDecoderOnlyModelConfig
Configuration class for RBLN decoder-only models for Causal Language Modeling.
This class extends RBLNModelConfig with parameters specific to decoder-only transformer architectures optimized for RBLN devices. It controls aspects like attention implementation, KV cache management, and batching for inference.
Functions¶
__init__(batch_size=None, max_seq_len=None, use_inputs_embeds=None, use_attention_mask=None, use_position_ids=None, attn_impl=None, kvcache_partition_len=None, kvcache_block_size=None, quantization=None, prefill_chunk_size=None, kvcache_num_blocks=None, decoder_batch_sizes=None, cache_impl=None, sliding_window=None, sliding_window_layers=None, phases=None, logits_to_keep=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for inference. Defaults to 1. |
None
|
max_seq_len
|
Optional[int]
|
The maximum sequence length supported by the model.
If not provided, it attempts to infer from the model's configuration
( |
None
|
use_inputs_embeds
|
Optional[bool]
|
Whether to use input embeddings ( |
None
|
use_attention_mask
|
Optional[bool]
|
Whether the model requires attention masks during inference. This is typically determined based on the target device and model architecture. Defaults are often set automatically based on the model and RBLN NPU. |
None
|
use_position_ids
|
Optional[bool]
|
Whether to use position IDs. Defaults to False. |
None
|
attn_impl
|
Optional[str]
|
Specifies the attention implementation to use.
See the "Attention Implementation ( |
None
|
kvcache_partition_len
|
Optional[int]
|
Defines the partition length for the KV cache
when using "flash_attn". See the "KV Cache Partition Length ( |
None
|
kvcache_block_size
|
Optional[int]
|
Sets the size (in number of tokens) of each block
in the PagedAttention KV cache. See the "KV Cache Block Size ( |
None
|
prefill_chunk_size
|
Optional[int]
|
The chunk size used during the prefill phase for processing input sequences. Defaults to 128. Must be a positive integer divisible by 64. Affects prefill performance and memory usage. |
None
|
kvcache_num_blocks
|
Optional[int]
|
The total number of blocks to allocate for the
PagedAttention KV cache. See the "KV Cache Number of Blocks ( |
None
|
decoder_batch_sizes
|
Optional[List[int]]
|
A list of batch sizes for which separate decoder models will be compiled. This allows the model to handle varying batch sizes efficiently during generation. If not specified, defaults to a list containing only the model's main batch size. When specifying multiple batch sizes: 1) All values must be less than or equal to the main batch size. 2) The list will be sorted in descending order (larger batch sizes first). 3) If using multiple decoders, at least one batch size should match the main batch size. |
None
|
cache_impl
|
Optional[CacheImplType]
|
Specifies the KV cache implementation strategy. Defaults to "static".
- "static": Uses a fixed-size global KV cache for all layers, suitable for standard attention patterns.
- "sliding_window": Implements a sliding window KV cache, where each layer maintains a local cache of recent tokens.
- "hybrid": Combines both static and sliding window approaches, allowing different layers to use different cache strategies.
The choice affects memory usage and attention patterns. When using "sliding_window" or "hybrid",
you must specify the |
None
|
sliding_window
|
Optional[int]
|
The size of the sliding window. Defaults to None. |
None
|
sliding_window_layers
|
Optional[List[int]]
|
The layers to use for the sliding window used in the hybrid model. Defaults to None. |
None
|
phases
|
Optional[List[PhaseType]]
|
The phases to compile the model for. Defaults to ["prefill"] if DecoderOnlyModel is used, ["prefill", "decode"] if DecoderOnlyModelForCausalLM is used. |
None
|
logits_to_keep
|
Optional[int]
|
The number of logits to keep for the decoder. If set to 0, the decoder will keep all logits. Defaults to 0 if DecoderOnlyModel is used, 1 if DecoderOnlyModelForCausalLM is used. |
None
|
kwargs
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If |
ValueError
|
If |
ValueError
|
If |
ValueError
|
If attention parameter constraints are violated (e.g., |
Attention Implementation
attn_impl
determines the underlying attention mechanism used by the model.
"eager"
(Default ifkvcache_partition_len
is not set): Uses the standard PyTorch attention implementation. Suitable for sequences up to a certain limit (e.g., 32,768 tokens)."flash_attn"
: Utilizes an optimized Flash Attention implementation, beneficial for longer sequences and potentially faster execution. Requiresmax_seq_len
to be at least 8,192. Ifkvcache_partition_len
is specified,attn_impl
automatically defaults to"flash_attn"
. When using"flash_attn"
,kvcache_block_size
must equalkvcache_partition_len
.
The choice impacts performance and memory usage, especially for long sequences.
Constraints related to max_seq_len
and kvcache_partition_len
apply when using
"flash_attn"
.
KV Cache Partition Length
kvcache_partition_len
is relevant only when attn_impl
is "flash_attn"
.
- It defines the length (number of tokens) of each partition within the Key-Value (KV) cache.
- Must be between 4,096 and 32,768 (inclusive).
- When using
"flash_attn"
,max_seq_len
must be a multiple ofkvcache_partition_len
and at least twice its value (max_seq_len >= 2 * kvcache_partition_len
). - If
attn_impl
is"flash_attn"
andkvcache_partition_len
isNone
, it defaults to 16,384.
KV Cache Number of Blocks
kvcache_num_blocks
controls the total number of memory blocks allocated for the PagedAttention KV cache.
Each block holds kvcache_block_size
tokens of Key and Value states.
- Automatic Estimation (Default): If
kvcache_num_blocks
isNone
, the system estimates the maximum number of blocks that can fit into the available RBLN device memory. This calculation considers the model size (kernel memory), required buffer memory, the number of layers and heads,kvcache_block_size
, tensor parallelism, and available RBLN NPU DRAM. This aims to maximize cache capacity for potentially better performance with long sequences or larger batches without manual tuning. - Manual Setting: You can explicitly set the number of blocks. This provides finer control but requires careful consideration of memory limits. Setting it too high may lead to compilation errors if it exceeds available memory. The system will issue warnings if your setting exceeds the estimated maximum.
- Performance Impact: A larger number of blocks reduces the likelihood of cache eviction, which is beneficial for tasks involving many long sequences or large batch sizes, enabling higher throughput. However, allocating more blocks consumes more memory.
- Minimum Requirement: The system requires a minimum number of blocks to function,
calculated based on
max_seq_len
,kvcache_block_size
, andbatch_size
. The number of allocated blocks must be sufficient to hold at least one full sequence length per item in the batch concurrently. The system will log warnings or raise errors if constraints are violated (e.g., ifkvcache_num_blocks
is less thanbatch_size
when using Flash Attention).
The optimal value depends on the specific model, task, hardware, and desired trade-off between performance and memory usage. The automatic estimation provides a robust starting point.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
This file defines generic base classes for various RBLN models, such as Question Answering, Image Classification, Audio Classification, Sequence Classification, and Masked Language Modeling. These classes implement common functionalities and configurations to be used across different model architectures.
Classes¶
RBLNTransformerEncoder
¶
Bases: RBLNModel
Functions¶
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
forward(*args, return_dict=None, **kwargs)
¶
Defines the forward pass of RBLNModel
. The interface mirrors HuggingFace conventions so it can act as a drop-in
replacement in many cases.
This method executes the compiled RBLN model on RBLN NPU devices while remaining fully compatible with Hugging Face
Transformers and Diffusers APIs. In practice, RBLNModel
can replace models built on torch.nn.Module
— including
transformers.PreTrainedModel
implementations and Diffusers components based on diffusers.ModelMixin
— enabling
seamless integration into existing workflows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args
|
Any
|
Variable length argument list containing model inputs. The format matches the original HuggingFace model's forward method signature (e.g., input_ids, attention_mask for transformers models, or sample, timestep for diffusers models). |
()
|
return_dict
|
Optional[bool]
|
Whether to return outputs as a dictionary-like object or as a tuple. When |
None
|
kwargs
|
Any
|
Arbitrary keyword arguments containing additional model inputs and parameters, matching the original HuggingFace model's interface. |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Model outputs in the same format as the original HuggingFace model. |
Any
|
If |
Any
|
If |
Note
- This method maintains the exact same interface as the original HuggingFace model's forward method
- The compiled model runs on RBLN NPU hardware for accelerated inference
- All HuggingFace model features (generation, attention patterns, etc.) are preserved
- Can be used directly in HuggingFace pipelines, transformers.Trainer, and other workflows
RBLNTransformerEncoderForFeatureExtraction
¶
Bases: RBLNTransformerEncoder
Functions¶
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
forward(*args, return_dict=None, **kwargs)
¶
Defines the forward pass of RBLNModel
. The interface mirrors HuggingFace conventions so it can act as a drop-in
replacement in many cases.
This method executes the compiled RBLN model on RBLN NPU devices while remaining fully compatible with Hugging Face
Transformers and Diffusers APIs. In practice, RBLNModel
can replace models built on torch.nn.Module
— including
transformers.PreTrainedModel
implementations and Diffusers components based on diffusers.ModelMixin
— enabling
seamless integration into existing workflows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args
|
Any
|
Variable length argument list containing model inputs. The format matches the original HuggingFace model's forward method signature (e.g., input_ids, attention_mask for transformers models, or sample, timestep for diffusers models). |
()
|
return_dict
|
Optional[bool]
|
Whether to return outputs as a dictionary-like object or as a tuple. When |
None
|
kwargs
|
Any
|
Arbitrary keyword arguments containing additional model inputs and parameters, matching the original HuggingFace model's interface. |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Model outputs in the same format as the original HuggingFace model. |
Any
|
If |
Any
|
If |
Note
- This method maintains the exact same interface as the original HuggingFace model's forward method
- The compiled model runs on RBLN NPU hardware for accelerated inference
- All HuggingFace model features (generation, attention patterns, etc.) are preserved
- Can be used directly in HuggingFace pipelines, transformers.Trainer, and other workflows
RBLNModelForQuestionAnswering
¶
Bases: RBLNTransformerEncoder
Functions¶
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
forward(*args, return_dict=None, **kwargs)
¶
Defines the forward pass of RBLNModel
. The interface mirrors HuggingFace conventions so it can act as a drop-in
replacement in many cases.
This method executes the compiled RBLN model on RBLN NPU devices while remaining fully compatible with Hugging Face
Transformers and Diffusers APIs. In practice, RBLNModel
can replace models built on torch.nn.Module
— including
transformers.PreTrainedModel
implementations and Diffusers components based on diffusers.ModelMixin
— enabling
seamless integration into existing workflows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args
|
Any
|
Variable length argument list containing model inputs. The format matches the original HuggingFace model's forward method signature (e.g., input_ids, attention_mask for transformers models, or sample, timestep for diffusers models). |
()
|
return_dict
|
Optional[bool]
|
Whether to return outputs as a dictionary-like object or as a tuple. When |
None
|
kwargs
|
Any
|
Arbitrary keyword arguments containing additional model inputs and parameters, matching the original HuggingFace model's interface. |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Model outputs in the same format as the original HuggingFace model. |
Any
|
If |
Any
|
If |
Note
- This method maintains the exact same interface as the original HuggingFace model's forward method
- The compiled model runs on RBLN NPU hardware for accelerated inference
- All HuggingFace model features (generation, attention patterns, etc.) are preserved
- Can be used directly in HuggingFace pipelines, transformers.Trainer, and other workflows
RBLNModelForSequenceClassification
¶
Bases: RBLNTransformerEncoder
Functions¶
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
forward(*args, return_dict=None, **kwargs)
¶
Defines the forward pass of RBLNModel
. The interface mirrors HuggingFace conventions so it can act as a drop-in
replacement in many cases.
This method executes the compiled RBLN model on RBLN NPU devices while remaining fully compatible with Hugging Face
Transformers and Diffusers APIs. In practice, RBLNModel
can replace models built on torch.nn.Module
— including
transformers.PreTrainedModel
implementations and Diffusers components based on diffusers.ModelMixin
— enabling
seamless integration into existing workflows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args
|
Any
|
Variable length argument list containing model inputs. The format matches the original HuggingFace model's forward method signature (e.g., input_ids, attention_mask for transformers models, or sample, timestep for diffusers models). |
()
|
return_dict
|
Optional[bool]
|
Whether to return outputs as a dictionary-like object or as a tuple. When |
None
|
kwargs
|
Any
|
Arbitrary keyword arguments containing additional model inputs and parameters, matching the original HuggingFace model's interface. |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Model outputs in the same format as the original HuggingFace model. |
Any
|
If |
Any
|
If |
Note
- This method maintains the exact same interface as the original HuggingFace model's forward method
- The compiled model runs on RBLN NPU hardware for accelerated inference
- All HuggingFace model features (generation, attention patterns, etc.) are preserved
- Can be used directly in HuggingFace pipelines, transformers.Trainer, and other workflows
RBLNModelForMaskedLM
¶
Bases: RBLNTransformerEncoder
Functions¶
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
forward(*args, return_dict=None, **kwargs)
¶
Defines the forward pass of RBLNModel
. The interface mirrors HuggingFace conventions so it can act as a drop-in
replacement in many cases.
This method executes the compiled RBLN model on RBLN NPU devices while remaining fully compatible with Hugging Face
Transformers and Diffusers APIs. In practice, RBLNModel
can replace models built on torch.nn.Module
— including
transformers.PreTrainedModel
implementations and Diffusers components based on diffusers.ModelMixin
— enabling
seamless integration into existing workflows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args
|
Any
|
Variable length argument list containing model inputs. The format matches the original HuggingFace model's forward method signature (e.g., input_ids, attention_mask for transformers models, or sample, timestep for diffusers models). |
()
|
return_dict
|
Optional[bool]
|
Whether to return outputs as a dictionary-like object or as a tuple. When |
None
|
kwargs
|
Any
|
Arbitrary keyword arguments containing additional model inputs and parameters, matching the original HuggingFace model's interface. |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Model outputs in the same format as the original HuggingFace model. |
Any
|
If |
Any
|
If |
Note
- This method maintains the exact same interface as the original HuggingFace model's forward method
- The compiled model runs on RBLN NPU hardware for accelerated inference
- All HuggingFace model features (generation, attention patterns, etc.) are preserved
- Can be used directly in HuggingFace pipelines, transformers.Trainer, and other workflows
RBLNModelForTextEncoding
¶
Bases: RBLNTransformerEncoder
Functions¶
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
forward(*args, return_dict=None, **kwargs)
¶
Defines the forward pass of RBLNModel
. The interface mirrors HuggingFace conventions so it can act as a drop-in
replacement in many cases.
This method executes the compiled RBLN model on RBLN NPU devices while remaining fully compatible with Hugging Face
Transformers and Diffusers APIs. In practice, RBLNModel
can replace models built on torch.nn.Module
— including
transformers.PreTrainedModel
implementations and Diffusers components based on diffusers.ModelMixin
— enabling
seamless integration into existing workflows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args
|
Any
|
Variable length argument list containing model inputs. The format matches the original HuggingFace model's forward method signature (e.g., input_ids, attention_mask for transformers models, or sample, timestep for diffusers models). |
()
|
return_dict
|
Optional[bool]
|
Whether to return outputs as a dictionary-like object or as a tuple. When |
None
|
kwargs
|
Any
|
Arbitrary keyword arguments containing additional model inputs and parameters, matching the original HuggingFace model's interface. |
{}
|
Returns:
Type | Description |
---|---|
Any
|
Model outputs in the same format as the original HuggingFace model. |
Any
|
If |
Any
|
If |
Note
- This method maintains the exact same interface as the original HuggingFace model's forward method
- The compiled model runs on RBLN NPU hardware for accelerated inference
- All HuggingFace model features (generation, attention patterns, etc.) are preserved
- Can be used directly in HuggingFace pipelines, transformers.Trainer, and other workflows
Classes¶
RBLNTransformerEncoderConfig
¶
Bases: RBLNModelConfig
Functions¶
__init__(max_seq_len=None, batch_size=None, model_input_names=None, model_input_shapes=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_seq_len
|
Optional[int]
|
Maximum sequence length supported by the model. |
None
|
batch_size
|
Optional[int]
|
The batch size for inference. Defaults to 1. |
None
|
model_input_names
|
Optional[List[str]]
|
Names of the input tensors for the model. Defaults to class-specific rbln_model_input_names if not provided. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNTransformerEncoderForFeatureExtractionConfig
¶
Bases: RBLNTransformerEncoderConfig
Functions¶
__init__(max_seq_len=None, batch_size=None, model_input_names=None, model_input_shapes=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_seq_len
|
Optional[int]
|
Maximum sequence length supported by the model. |
None
|
batch_size
|
Optional[int]
|
The batch size for inference. Defaults to 1. |
None
|
model_input_names
|
Optional[List[str]]
|
Names of the input tensors for the model. Defaults to class-specific rbln_model_input_names if not provided. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNModelForQuestionAnsweringConfig
¶
Bases: RBLNTransformerEncoderConfig
Functions¶
__init__(max_seq_len=None, batch_size=None, model_input_names=None, model_input_shapes=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_seq_len
|
Optional[int]
|
Maximum sequence length supported by the model. |
None
|
batch_size
|
Optional[int]
|
The batch size for inference. Defaults to 1. |
None
|
model_input_names
|
Optional[List[str]]
|
Names of the input tensors for the model. Defaults to class-specific rbln_model_input_names if not provided. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNModelForSequenceClassificationConfig
¶
Bases: RBLNTransformerEncoderConfig
Functions¶
__init__(max_seq_len=None, batch_size=None, model_input_names=None, model_input_shapes=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_seq_len
|
Optional[int]
|
Maximum sequence length supported by the model. |
None
|
batch_size
|
Optional[int]
|
The batch size for inference. Defaults to 1. |
None
|
model_input_names
|
Optional[List[str]]
|
Names of the input tensors for the model. Defaults to class-specific rbln_model_input_names if not provided. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNModelForMaskedLMConfig
¶
Bases: RBLNTransformerEncoderConfig
Functions¶
__init__(max_seq_len=None, batch_size=None, model_input_names=None, model_input_shapes=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_seq_len
|
Optional[int]
|
Maximum sequence length supported by the model. |
None
|
batch_size
|
Optional[int]
|
The batch size for inference. Defaults to 1. |
None
|
model_input_names
|
Optional[List[str]]
|
Names of the input tensors for the model. Defaults to class-specific rbln_model_input_names if not provided. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
Classes¶
RBLNModelForSeq2SeqLM
¶
Bases: RBLNModel
, ABC
This is a generic model class that will be instantiated as one of the model classes of the library (with a sequence-to-sequence language modeling head) when created with the from_pretrained() class method.
This model inherits from [RBLNModel
]. Check the superclass documentation for the generic methods the library implements for all its models.
A class to convert and run pre-trained transformers based Seq2SeqLM models on RBLN devices. It implements the methods to convert a pre-trained transformers Seq2SeqLM model into a RBLN transformer model by: - transferring the checkpoint weights of the original into an optimized RBLN graph, - compiling the resulting graph using the RBLN compiler.
Currently, this model class only supports the 'bart' and 't5' models from the transformers library. Future updates may include support for additional model types.
Functions¶
from_model(model, config=None, rbln_config=None, model_save_dir=None, subfolder='', **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
config
|
Optional[PretrainedConfig]
|
The configuration object associated with the model. |
None
|
rbln_config
|
Optional[Union[RBLNModelConfig, Dict]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
from_pretrained(model_id, export=None, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
Optional[bool]
|
A boolean flag to indicate whether the model should be compiled. If None, it will be determined based on the existence of the compiled model files in the model_id. |
None
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime.
This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Any
|
Additional keyword arguments. Arguments with the prefix |
{}
|
Returns:
Type | Description |
---|---|
RBLNModel
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory, push_to_hub=False, **kwargs)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[~optimum.rbln.modeling_base.RBLNBaseModel.from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, Path]
|
Directory where to save the model file. |
required |
push_to_hub
|
bool
|
Whether or not to push your model to the HuggingFace model hub after saving it. |
False
|
Functions¶
Classes¶
RBLNModelForSeq2SeqLMConfig
¶
Bases: RBLNModelConfig
Functions¶
__init__(batch_size=None, enc_max_seq_len=None, dec_max_seq_len=None, use_attention_mask=None, pad_token_id=None, kvcache_num_blocks=None, kvcache_block_size=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for inference. Defaults to 1. |
None
|
enc_max_seq_len
|
Optional[int]
|
Maximum sequence length for the encoder. |
None
|
dec_max_seq_len
|
Optional[int]
|
Maximum sequence length for the decoder. |
None
|
use_attention_mask
|
Optional[bool]
|
Whether to use attention masks during inference. |
None
|
pad_token_id
|
Optional[int]
|
The ID of the padding token in the vocabulary. |
None
|
kvcache_num_blocks
|
Optional[int]
|
The total number of blocks to allocate for the PagedAttention KV cache for the SelfAttention. Defaults to batch_size. |
None
|
kvcache_block_size
|
Optional[int]
|
Sets the size (in number of tokens) of each block in the PagedAttention KV cache for the SelfAttention. Defaults to dec_max_seq_len. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.