Model API¶

`optimum API`¶

Generic model classes¶

Classes¶

`RBLNBaseModel` ¶

An abstract base class for compiling, loading, and saving neural network models from the huggingface transformers and diffusers libraries to run on RBLN NPU devices.

This class supports loading and saving models using the from_pretrained and save_pretrained methods, similar to the huggingface libraries.

The from_pretrained method loads a model corresponding to the given model_id from a local repository or the HuggingFace Hub onto the NPU. If the model is a PyTorch model and export=True is passed as a kwarg, it compiles the PyTorch model corresponding to the given model_id before loading. If model_id is an already rbln-compiled model, it can be directly loaded onto the NPU with export=False.

rbln_npu is a kwarg required for compilation, specifying the name of the NPU to be used. If this keyword is not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.

rbln_device specifies the device to be used at runtime. If not specified, device 0 is used.

rbln_create_runtimes indicates whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.

rbln_config is a dictionary that allows passing configurations for the model and its submodules. Any parameter prefixed with rbln_ in the from_pretrained method is internally interpreted as a value in rbln_config.

For example, rbln_batch_size=4 is equivalent to passing rbln_config={"batch_size": 4}.

Example usage of rbln_config:

model = RBLNBaseModel.from_pretrained(
    model_id,
    export=True,
    rbln_config={
        "batch_size": 4,
    },
)

This is equivalent to:

model = RBLNBaseModel.from_pretrained(
    model_id,
    export=True,
    rbln_batch_size=4,
)

Models compiled in this way can be saved to a local repository using save_pretrained or uploaded to the huggingface hub.

It also supports generation through generate (for transformers models that support generation).

RBLNBaseModel is a class for models consisting of an arbitrary number of torch.nn.Modules, and therefore is an abstract class without explicit implementations of forward or export functions. To inherit from this class, forward, export, etc. must be implemented.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_config=None, **kwargs)

classmethod ¶

Load a pretrained model from a given model ID and optimize it for NPU execution.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model or compiled model to be loaded. It can be downloaded from the HuggingFace model hub, a local path.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_config`	`Optional[Dict[str, Any]]`	A dictionary containing configuration for the model and its submodules. This affects the compilation settings for both the main module and its submodules.	`None`
`**kwargs`	`Dict[str, Any]`	Additional keyword arguments. Any argument prefixed with 'rbln_' will be treated as part of rbln_config. Arguments without the 'rbln_' prefix will be passed directly to the original Huggingface's from_pretrained method.	`{}`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNModel` ¶

A class that inherits from RBLNBaseModel for models consisting of a single torch.nn.Module.

This class supports all the functionality of RBLNBaseModel, including loading and saving models using the from_pretrained and save_pretrained methods, compiling PyTorch models for execution on RBLN NPU devices.

model = RBLNModel.from_pretrained("model_id", export=True, rbln_npu="npu_name")
outputs = model(**inputs)

Natural Language Processing¶

Classes¶

`RBLNLlamaForCausalLM` ¶

The Llama model transformer with a language modeling head (linear layer) on top. This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models. A class to convert and run pre-trained HuggingFace transformer-based LlamaForCausalLM. It implements the methods to convert a pre-trained transformers LlamaForCausalLM into a RBLNLlamaForCausalLM by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN Compiler.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

from_model(model, *, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. This should be a model instance from HuggingFace libraries.	required
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`RBLNLlamaForCausalLM`	A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.

Parameters:

Name	Type	Description	Default
`input_ids`	`LongTensor`	The sequence used as a prompt for the generation	required
`attention_mask`	`Optional[Tensor]`	The attention mask to apply on the sequence	`None`
`max_length`	`Optional[int]`	The maximum length of the sequence to be generated	`None`

`RBLNGemmaForCausalLM` ¶

The Gemma model transformer with a language modeling head (linear layer) on top. This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models. A class to convert and run pre-trained HuggingFace transformer-based GemmaForCausalLM. It implements the methods to convert a pre-trained transformers GemmaForCausalLM into a RBLNGemmaForCausalLM by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN Compiler.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

from_model(model, *, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. This should be a model instance from HuggingFace libraries.	required
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`RBLNGemmaForCausalLM`	A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.

Parameters:

Name	Type	Description	Default
`input_ids`	`LongTensor`	The sequence used as a prompt for the generation	required
`attention_mask`	`Optional[Tensor]`	The attention mask to apply on the sequence	`None`
`max_length`	`Optional[int]`	The maximum length of the sequence to be generated	`None`

`RBLNMistralForCausalLM` ¶

The Mistral model transformer with a language modeling head (linear layer) on top. This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models. A class to convert and run pre-trained HuggingFace transformer-based MistralForCausalLM. It implements the methods to convert a pre-trained transformers MistralForCausalLM into a RBLNMistralForCausalLM by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN Compiler.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. Users can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

from_model(model, *, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. This should be a model instance from HuggingFace libraries.	required
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`RBLNMistralForCausalLM`	A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.

Parameters:

Name	Type	Description	Default
`input_ids`	`LongTensor`	The sequence used as a prompt for the generation	required
`attention_mask`	`Optional[Tensor]`	The attention mask to apply on the sequence	`None`
`max_length`	`Optional[int]`	The maximum length of the sequence to be generated	`None`

`RBLNQwen2ForCausalLM` ¶

The Qwen2 Model transformer with a language modeling head (linear layer) on top. This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models. A class to convert and run pre-trained HuggingFace transformer-based Qwen2ForCausalLM. It implements the methods to convert a pre-trained transformers Qwen2ForCausalLM into a RBLNQwen2ForCausalLM by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN compiler.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. Users can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

from_model(model, *, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. This should be a model instance from HuggingFace libraries.	required
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`RBLNQwen2ForCausalLM`	A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.

Parameters:

Name	Type	Description	Default
`input_ids`	`LongTensor`	The sequence used as a prompt for the generation	required
`attention_mask`	`Optional[Tensor]`	The attention mask to apply on the sequence	`None`
`max_length`	`Optional[int]`	The maximum length of the sequence to be generated	`None`

`RBLNExaoneForCausalLM` ¶

The EXAONE Model transformer with a language modeling head on top (linear layer with weights tied to the input embeddings). This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models. A class to convert and run pre-trained HuggingFace transformer-based ExaoneForCausalLM. It implements the methods to convert a pre-trained transformers ExaoneForCausalLM into a RBLNExaoneForCausalLM by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN compiler.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. Users can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`

from_model(model, *, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_attn_impl=None, rbln_kvcache_partition_len=None, rbln_activate_profiler=None)

classmethod ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. This should be a model instance from HuggingFace libraries.	required
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_attn_impl`	`Optional[str]`	Specifies the attention implementation method. Options include: - `"eager"`: Standard attention computation for general-purpose tasks. - `"flash_attn"`: Flash Attention (`Recommended for long sequences (>= 32,768 tokens)`).	`None`
`rbln_kvcache_partition_len`	`Optional[int]`	Defines the partition size of the KV cache. Used exclusively with `flash_attn`. This setting: - Divides the KV cache into smaller chunks to address memory constraints.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`RBLNExaoneForCausalLM`	A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.

Parameters:

Name	Type	Description	Default
`input_ids`	`LongTensor`	The sequence used as a prompt for the generation	required
`attention_mask`	`Optional[Tensor]`	The attention mask to apply on the sequence	`None`
`max_length`	`Optional[int]`	The maximum length of the sequence to be generated	`None`

`RBLNPhiForCausalLM` ¶

The Phi model transformer with a language modeling head (linear layer) on top. This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models. A class to convert and run pre-trained HuggingFace transformer-based PhiForCausalLM. It implements the methods to convert a pre-trained transformers PhiForCausalLM into a RBLNPhiForCausalLM by: - transferring the checkpoint weights of the original into an optimized RBLN graph, - compiling the resulting graph using the RBLN compiler.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

from_model(model, *, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. This should be a model instance from HuggingFace libraries.	required
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`RBLNPhiForCausalLM`	A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model. Args: input_ids: The sequence used as a prompt for the generation attention_mask: The attention mask to apply on the sequence max_length: The maximum length of the sequence to be generated

`RBLNGPT2LMHeadModel` ¶

The GPT2 Model transformer with a language modeling head on top (linear layer with weights tied to the input embeddings).

This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its model.

It implements the methods to convert a pre-trained GPT2LMHeadModel into RBLNGPT2LMHeadModel by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN Compiler.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

from_model(model, *, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. This should be a model instance from HuggingFace libraries.	required
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`RBLNGPT2LMHeadModel`	A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model. Custom configuration is available like input_ids, attention_mask, max_length, etc.

Parameters:

Name	Type	Description	Default
`input_ids`	`LongTensor`	The sequence used as a prompt for the generation	required
`attention_mask`	`Optional[Tensor]`	The attention mask to apply on the sequence	`None`
`max_length`	`Optional[int]`	The maximum length of the sequence to be generated	`None`

`RBLNT5ForConditionalGeneration` ¶

RBLN implementation of T5ForConditionalGeneration, optimized for NPU execution.

This class provides an interface compatible with HuggingFace's T5ForConditionalGeneration, but with RBLN-specific optimizations. It implements three key methods:

from_pretrained: Loads a pre-trained T5 model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.
generate: Generates new text sequences based on input prompts, similar to the HuggingFace implementation.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_enc_max_seq_len=None, rbln_dec_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_enc_max_seq_len`	`Optional[int]`	The maximum sequence length of the encoder. If not specified, model config's value is used.	`None`
`rbln_dec_max_seq_len`	`Optional[int]`	The maximum sequence length of the decoder. If not specified, model config's value is used.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.

Parameters:

Name	Type	Description	Default
`input_ids`	`LongTensor`	The sequence used as a prompt for the generation	required
`attention_mask`	`Optional[Tensor]`	The attention mask to apply on the sequence	`None`
`max_length`	`Optional[int]`	The maximum length of the sequence to be generated	`None`

`RBLNBartForConditionalGeneration` ¶

RBLN implementation for BART (Bidirectional and Auto-Regressive Transformers), optimized for NPU execution.

This class provides an interface compatible with HuggingFace's BartForConditionalGeneration, but with RBLN-specific optimizations. It implements three key methods:

from_pretrained: Loads a pre-trained BartForConditionalGeneration model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.
generate: Generates new text sequences based on input prompts, similar to the HuggingFace implementation.

Note: As of now, beam search in the generate method is not supported.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_enc_max_seq_len=None, rbln_dec_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_enc_max_seq_len`	`Optional[int]`	The maximum sequence length of the encoder. If not specified, model config's value is used.	`None`
`rbln_dec_max_seq_len`	`Optional[int]`	The maximum sequence length of the decoder. If not specified, model config's value is used.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.

Parameters:

Name	Type	Description	Default
`input_ids`	`LongTensor`	The sequence used as a prompt for the generation	required
`attention_mask`	`Optional[Tensor]`	The attention mask to apply on the sequence	`None`
`max_length`	`Optional[int]`	The maximum length of the sequence to be generated	`None`

`RBLNBertForQuestionAnswering` ¶

RBLN implementation for BertForQuestionAnswering, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's BertForQuestionAnswering, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained BertForQuestionAnswering model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNDistilBertForQuestionAnswering` ¶

RBLN implementation for DistilBertForQuestionAnswering, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's DistilBertForQuestionAnswering, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained BertForQuestionAnswering model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNBertForMaskedLM` ¶

RBLN implementation for BertForMaskedLM, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's BertForMaskedLM, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained BertForMaskedLM model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNMidmLMHeadModel` ¶

The Mi:dm model transformer with a language modeling head (linear layer) on top. This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models. A class to convert and run pre-trained HuggingFace transformer-based MidmLMHeadModel. It implements the methods to convert a pre-trained transformers MidmLMHeadModel into a RBLNMidmLMHeadModel by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN Compiler.

Functions¶

from_pretrained(model_id, export=False, trust_remote_code=True, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`trust_remote_code`	`bool`	A boolean flag to allow or disallow the execution of custom code from the model repository. If set to `True`, it permits the model to execute custom code in the model repository, which may include additional model architectures, tokenizers, or processing scripts. Set this to `False` to enforce stricter security when loading models from untrusted sources.	`True`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

from_model(model, *, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_tensor_parallel_size=1, rbln_activate_profiler=None)

classmethod ¶

Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The PyTorch model to be compiled. This should be a model instance from HuggingFace libraries.	required
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[Union[int, List[int]]]`	the device(s) to be used at runtime. If an integer is provided, it specifies the single device to use. If a list of integers is provided, it specifies the devices to use for tensor parallelism across multiple NPUs.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_tensor_parallel_size`	`Optional[int]`	Compile and execute the model using multiple NPUs. This feature is only available on ATOM+ (`RBLN-CA12`). You can check the type of your current RBLN NPU using the `rbln-stat` command.	`1`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

The method performs the following steps:

Compiles the PyTorch model into an optimized RBLN graph
Configures the model for the specified NPU device
Creates the necessary runtime objects if requested
Saves the compiled model and configurations

Returns:

Type	Description
`RBLNMidmLMHeadModel`	A RBLN model instance ready for inference on RBLN NPU devices.

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_ids, attention_mask=None, max_length=None) ¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.

Parameters:

Name	Type	Description	Default
`input_ids`	`LongTensor`	The sequence used as a prompt for the generation	required
`attention_mask`	`Optional[Tensor]`	The attention mask to apply on the sequence	`None`
`max_length`	`Optional[int]`	The maximum length of the sequence to be generated	`None`

`RBLNRobertaForMaskedLM` ¶

RBLN implementation for RobertaForMaskedLM, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's RobertaForMaskedLM, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained RobertaForMaskedLM model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNRobertaForSequenceClassification` ¶

RBLN implementation for RobertaForSequenceClassification, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's RobertaForSequenceClassification, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained RobertaForSequenceClassification model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNXLMRobertaModel` ¶

RBLN implementation for XLMRobertaModel, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's XLMRobertaModel, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained XLMRobertaModel model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNXLMRobertaForSequenceClassification` ¶

RBLN implementation for XLMRobertaForSequenceClassification, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's XLMRobertaForSequenceClassification, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained XLMRobertaForSequenceClassification model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNBertModel` ¶

RBLN implementation for BertModel, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's BertModel, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained BertModel model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_model_input_names=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_model_input_names`	`Optional[List[int]]`	A list of inputs expected in the forward pass of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNBartModel` ¶

RBLN implementation for BartModel, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's BartModel, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained BartModel model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_model_input_names=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_model_input_names`	`Optional[List[int]]`	A list of inputs expected in the forward pass of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNT5EncoderModel` ¶

RBLN implementation for T5EncoderModel, optimized for execution on NPU devices.

This class provides an interface compatible with HuggingFace's T5EncoderModel, but with optimizations for RBLN NPUs. It implements two key methods:

from_pretrained: Loads a pre-trained T5EncoderModel model and converts it into an optimized RBLN graph.
save_pretrained: Saves the compiled RBLN model for efficient reuse.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_max_seq_len=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_max_seq_len`	`Optional[int]`	The maximum sequence length of the model.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

Classes¶

`RBLNLlavaNextForConditionalGeneration` ¶

RBLNLlavaNextForConditionalGeneration is a multi-modal model that combines vision and language processing capabilities, optimized for RBLN NPUs. It is designed for conditional generation tasks that involve both image and text inputs.

This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models.

Important Note

This model includes a Large Language Model (LLM) as a submodule. For optimal performance, it is highly recommended to use tensor parallelism for the language model. This can be achieved by using the rbln_config parameter in the from_pretrained method. Here's an example of how to apply tensor parallelism:

model = RBLNLlavaNextForConditionalGeneration.from_pretrained(
    model_id,
    export=True,
    rbln_config={
        "language_model": {
            "tensor_parallel_size": 4,  # Apply tensor parallelism
            "max_seq_len": 32768,
            "use_inputs_embeds": True,
            "batch_size": 1,
            "activate_profiler": True,
        },
        "vision_feature_select_strategy": "default"
    },
)

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_vision_feature_select_strategy=None, rbln_config=None)

classmethod ¶

Load a pretrained RBLNLlavaNextForConditionalGeneration model from a given model ID and optimize it for RBLN NPUs.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub, a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_vision_feature_select_strategy`	`Optional[str]`	Strategy for selecting vision features. If not specified, the default strategy from the model config is used.	`None`
`rbln_config`	`Optional[Dict[str, Any]]`	A dictionary containing configurations for the main module and its submodules in RBLNLlavaNext. This is particularly important for applying tensor parallelism to the language model for optimal performance. Refer to the class docstring for an example of how to use this parameter.	`None`

save_pretrained(save_directory) ¶

Save a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	Directory to which to save. Will be created if it doesn't exist.	required

Stable Diffusion¶

Classes¶

`RBLNStableDiffusionPipeline` ¶

Pipeline for text-to-image generation using Stable Diffusion optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionPipeline] for the core text-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, unet, vae) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNStableDiffusionPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNStableDiffusionPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 512,            # Image height
        "img_width": 512,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "unet": {                     # UNet-specific settings
            "batch_size": 2           # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNStableDiffusionPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size": 2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion pipeline.

The function works similarly to the original StableDiffusionPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusionPipelineOutput`	[`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusionImg2ImgPipeline` ¶

Pipeline for image-to-image generation using Stable Diffusion optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionImg2ImgPipeline] for the core image-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, unet, vae) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNStableDiffusionImg2ImgPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using detailed rbln_config:

pipeline = RBLNStableDiffusionImg2ImgPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 512,            # Image height
        "img_width": 512,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "unet": {                     # UNet-specific settings
            "batch_size": 2           # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNStableDiffusionImg2ImgPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size": 2},
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion img2img pipeline.

The function works similarly to the original StableDiffusionImg2ImgPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusionPipelineOutput`	[`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusionInpaintPipeline` ¶

Pipeline for text-guided image inpainting using Stable Diffusion optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionInpaintPipeline] for the core text-guided image inpainting functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, unet, vae) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNStableDiffusionInpaintPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-inpainting",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using detailed rbln_config:

pipeline = RBLNStableDiffusionInpaintPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-inpainting",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 512,            # Image height
        "img_width": 512,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "unet": {                     # UNet-specific settings
            "batch_size": 2           # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNStableDiffusionInpaintPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-inpainting",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size": 2},
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion inpaint pipeline.

The function works similarly to the original StableDiffusionInpaintPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusionPipelineOutput`	[`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusionControlNetPipeline` ¶

Pipeline for text-to-image generation using Stable Diffusion and ControlNet optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionControlNetPipeline] for the core text-to-image and ControlNet functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, text_encoder_2, unet, vae, controlnet) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny")
pipeline = RBLNStableDiffusionControlNetPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    controlnet=controlnet,
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using detailed rbln_config:

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny")
pipeline = RBLNStableDiffusionControlNetPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    controlnet=controlnet,
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 512,            # Image height
        "img_width": 512,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "unet": {                     # UNet-specific settings
            "batch_size": 2           # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny")
pipeline = RBLNStableDiffusionControlNetPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    controlnet=controlnet,
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size": 2},
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion ControlNet pipeline.

The function works similarly to the original StableDiffusionControlNetPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusionPipelineOutput`	[`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusionControlNetImg2ImgPipeline` ¶

Pipeline for image-to-image generation using Stable Diffusion and ControlNet optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionControlNetImg2ImgPipeline] for the core image-to-image functionality with ControlNet
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes Per-component settings under keys matching component names (text_encoder, unet, vae, controlnet) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11f1p_sd15_depth")
pipeline = RBLNStableDiffusionControlNetImg2ImgPipeline.from_pretrained(
    "benjamin-paine/stable-diffusion-v1-5",
    controlnet=controlnet,
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion ControlNet img2img pipeline.

The function works similarly to the original StableDiffusionControlNetImg2ImgPipeline.call method, but with optimizations for RBLN NPU inference. The core neural networks (VAE, UNet, text encoders, ControlNet) are compiled for NPU execution while the pipeline control logic remains dynamic.

Returns:

Type	Description
`StableDiffusionPipelineOutput`	[`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusion3Pipeline` ¶

Pipeline for text-to-image generation using Stable Diffusion 3 optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusion3Pipeline] for the core text-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes Per-component settings under keys matching component names (text_encoder, text_encoder_2, text_encoder_3, transformer, vae) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNStableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNStableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 512,            # Image height
        "img_width": 512,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "transformer": {                     # transformer-specific settings
            "batch_size": 2,           # Override batch size for transformer
            "device": 0,              # Target NPU device ID for run transformer
        },
        "text_encoder_3": {
            "device": 1,              # Target NPU device ID for run text_encoder_3
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNStableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_transformer={"batch_size": 2, "device": 0},
    rbln_text_encoder_3={"device": 1},
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion 3 pipeline.

The function works similarly to the original StableDiffusion3Pipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusion3PipelineOutput`	[`~pipelines.stable_diffusion_3.StableDiffusion3PipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion_3.StableDiffusion3PipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusion3Img2ImgPipeline` ¶

Pipeline for image-to-image generation using Stable Diffusion 3 optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusion3Img2ImgPipeline] for the core image-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes Per-component settings under keys matching component names (text_encoder, text_encoder_2, text_encoder_3, transformer, vae) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNStableDiffusion3Img2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using detailed rbln_config:

pipeline = RBLNStableDiffusion3Img2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 512,            # Image height
        "img_width": 512,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "transformer": {                     # transformer-specific settings
            "batch_size": 2,           # Override batch size for transformer
            "device": 0,              # Target NPU device ID for run transformer
        },
        "text_encoder_3": {
            "device": 1,              # Target NPU device ID for run text_encoder_3
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNStableDiffusion3Img2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_transformer={"batch_size": 2, "device": 0},
    rbln_text_encoder_3={"device": 1},
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion 3 img2img pipeline.

The function works similarly to the original StableDiffusion3Img2ImgPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusion3PipelineOutput`	[`~pipelines.stable_diffusion_3.StableDiffusion3PipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion_3.StableDiffusion3PipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusion3InpaintPipeline` ¶

Pipeline for text-guided image inpainting using Stable Diffusion 3 optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusion3InpaintPipeline] for the core text-guided image inpainting functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes Per-component settings under keys matching component names (text_encoder, text_encoder_2, text_encoder_3, transformer, vae) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNStableDiffusion3InpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using detailed rbln_config:

pipeline = RBLNStableDiffusion3InpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 512,            # Image height
        "img_width": 512,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "transformer": {                     # transformer-specific settings
            "batch_size": 2,           # Override batch size for transformer
            "device": 0,              # Target NPU device ID for run transformer
        },
        "text_encoder_3": {
            "device": 1,              # Target NPU device ID for run text_encoder_3
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNStableDiffusion3InpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_transformer={"batch_size": 2, "device": 0},
    rbln_text_encoder_3={"device": 1},
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion 3 inpaint pipeline.

The function works similarly to the original StableDiffusion3InpaintPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusion3PipelineOutput`	[`~pipelines.stable_diffusion_3.StableDiffusion3PipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion_3.StableDiffusion3PipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusionXLPipeline` ¶

Pipeline for text-to-image generation using Stable Diffusion XL optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionXLPipeline] for the core text-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, text_encoder_2, unet, vae) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original SDXL pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs (e.g., rbln_npu="RBLN-CA22" is equivalent to rbln_config={"npu": "RBLN-CA22"}).	`{}`

Examples:

Using default configurations:

pipeline = RBLNStableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNStableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    export=True,                       # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",            # Target NPU model. If not specified, uses the installed NPU
        "device": 0,                   # Target NPU device ID. Defaults to 0
        "create_runtimes": True,       # If False, only compiles model without loading to NPU
        "activate_profiler": True,     # If False, only inference model without profiling
        "batch_size": 1,               # Global batch size
        "img_height": 1024,            # Image height
        "img_width": 1024,             # Image width
        "guidance_scale": 7.5,         # Classifier-free guidance scale to determine batch size of unet
        "unet": {                      # UNet-specific settings
            "batch_size": 2            # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNStableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_batch_size=1,
    rbln_img_height=1024,
    rbln_img_width=1024,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size":2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion XL pipeline.

The function works similarly to the original StableDiffusionXLPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusionXLPipelineOutput`	[`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (height and width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusionXLImg2ImgPipeline` ¶

Pipeline for image-to-image generation using Stable Diffusion XL optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionXLImg2ImgPipeline] for the core image-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, text_encoder_2, unet, vae) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNStableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using detailed rbln_config:

pipeline = RBLNStableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 1024,           # Image height
        "img_width": 1024,            # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "unet": {                     # UNet-specific settings
            "batch_size": 2           # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNStableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=1024,
    rbln_img_width=1024,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size": 2},
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion XL img2img pipeline.

The function works similarly to the original StableDiffusionXLImg2ImgPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusionXLPipelineOutput`	[`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusionXLInpaintPipeline` ¶

Pipeline for text-guided image inpainting using Stable Diffusion XL optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionXLInpaintPipeline] for the core text-guided image inpainting functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, unet, vae) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNStableDiffusionXLInpaintPipeline.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using detailed rbln_config:

pipeline = RBLNStableDiffusionXLInpaintPipeline.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 512,            # Image height
        "img_width": 512,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "unet": {                     # UNet-specific settings
            "batch_size": 2           # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNStableDiffusionXLInpaintPipeline.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size": 2},
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion XL inpaint pipeline.

The function works similarly to the original StableDiffusionXLInpaintPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusionXLPipelineOutput`	[`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference: - Image dimensions (height and width) must be specified during model compilation - LoRA scales must be set during compilation using the from_pretrained method - Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusionXLControlNetPipeline` ¶

Pipeline for text-to-image generation using Stable Diffusion XL and ControlNet optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionXLControlNetPipeline] for the core text-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, text_encoder_2, unet, vae, controlnet) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original SDXL controlnet pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs (e.g., rbln_npu="RBLN-CA22" is equivalent to rbln_config={"npu": "RBLN-CA22"}).	`{}`

Examples:

Using default configurations:

controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix")
pipeline = RBLNStableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    export=True,
    controlnet=controlnet,
    vae=vae,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix")
pipeline = RBLNStableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    export=True,                       # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",            # Target NPU model. If not specified, uses the installed NPU
        "device": 0,                   # Target NPU device ID. Defaults to 0
        "create_runtimes": True,       # If False, only compiles model without loading to NPU
        "activate_profiler": True,     # If False, only inference model without profiling
        "batch_size": 1,               # Global batch size
        "img_height": 1024,            # Image height
        "img_width": 1024,             # Image width
        "guidance_scale": 7.5,         # Classifier-free guidance scale to determine batch size of unet
        "unet": {                      # UNet-specific settings
            "batch_size": 2            # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix")
pipeline = RBLNStableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=1024,
    rbln_img_width=1024,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size":2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion XL controlnet pipeline.

The function works similarly to the original StableDiffusionXLControlNetPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusionXLPipelineOutput`	[`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (height and width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

`RBLNStableDiffusionXLControlNetImg2ImgPipeline` ¶

Pipeline for image-to-image generation using Stable Diffusion XL and ControlNet optimized for RBLN NPUs.

This pipeline inherits from:

[StableDiffusionXLControlNetImg2ImgPipeline] for the core image-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, text_encoder_2, unet, vae, controlnet) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original SDXL pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs (e.g., rbln_npu="RBLN-CA22" is equivalent to rbln_config={"npu": "RBLN-CA22"}).	`{}`

Examples:

Using default configurations:

controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0-small")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix")
pipeline = RBLNStableDiffusionXLControlNetImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    export=True,
    controlnet=controlnet,
    vae=vae,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix")
pipeline = RBLNStableDiffusionXLControlNetImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    export=True,                       # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",            # Target NPU model. If not specified, uses the installed NPU
        "device": 0,                   # Target NPU device ID. Defaults to 0
        "create_runtimes": True,       # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,               # Global batch size
        "img_height": 1024,            # Image height
        "img_width": 1024,             # Image width
        "guidance_scale": 7.5,         # Classifier-free guidance scale to determine batch size of unet
        "unet": {                      # UNet-specific settings
            "batch_size": 2            # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-canny-sdxl-1.0")
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix")
pipeline = RBLNStableDiffusionXLControlNetImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=1024,
    rbln_img_width=1024,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size":2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Stable Diffusion XL Controlnet pipeline.

The function works similarly to the original StableDiffusionXLControlNetImg2ImgPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`StableDiffusionXLPipelineOutput`	[`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (height and width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

`RBLNKandinskyV22PriorPipeline` ¶

Pipeline for generating image prior for Kandinsky V2.2 optimized for RBLN NPUs.

This pipeline inherits from:

[KandinskyV22PriorPipeline] for the core image prior generation functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (text_encoder, image_encoder, prior) Prior generation settings: batch_size, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original Kandinsky V2.2 pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs (e.g., rbln_npu="RBLN-CA22" is equivalent to rbln_config={"npu": "RBLN-CA22"}).	`{}`

Examples:

Using default configurations:

pipeline = RBLNKandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNKandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    export=True,                       # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",            # Target NPU model. If not specified, uses the installed NPU
        "device": 0,                   # Target NPU device ID. Defaults to 0
        "create_runtimes": True,       # If False, only compiles model without loading to NPU
        "activate_profiler": True,     # If False, only inference model without profiling
        "batch_size": 1,               # Global batch size
        "guidance_scale": 7.5,         # Classifier-free guidance scale to determine batch size of prior
        "prior": {                     # Prior prior-specific settings
            "batch_size": 2            # Override batch size for prior
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNKandinskyV22PriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-prior",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_batch_size=1,
    rbln_guidance_scale=7.5,
    rbln_prior={"batch_size":2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate priors using the RBLN-optimized Kandinsky V2.2 prior pipeline.

The function works similarly to the original KandinskyV22PriorPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`KandinskyPriorPipelineOutput`	[`~pipelines.kandinsky2_2.KandinskyPriorPipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.kandinsky2_2.KandinskyPriorPipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (img_height and img_width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

`RBLNKandinskyV22Pipeline` ¶

Pipeline for text-to-image generation with given priors using Kandinsky V2.2 optimized for RBLN NPUs.

This pipeline inherits from:

[KandinskyV22Pipeline] for the core text-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (unet, movq) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNKandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNKandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 768,            # Image height
        "img_width": 768,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "unet": {                     # UNet-specific settings
            "batch_size": 2           # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNKandinskyV22Pipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=768,
    rbln_img_width=768,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size": 2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Kandinsky V2.2 combined pipeline.

The function works similarly to the original KandinskyV22Pipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`ImagePipelineOutput`	[`~pipelines.ImagePipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.ImagePipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (img_height and img_width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

`RBLNKandinskyV22Img2ImgPipeline` ¶

Pipeline for image-to-image generation with given priors using Kandinsky V2.2 optimized for RBLN NPUs.

This pipeline inherits from:

[KandinskyV22Img2ImgPipeline] for the core image-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (unet, movq) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNKandinskyV22Img2ImgPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNKandinskyV22Img2ImgPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 768,            # Image height
        "img_width": 768,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "unet": {                     # UNet-specific settings
            "batch_size": 2           # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNKandinskyV22Img2ImgPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=768,
    rbln_img_width=768,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size": 2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Kandinsky V2.2 img2img combined pipeline.

The function works similarly to the original KandinskyV22Img2ImgPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`ImagePipelineOutput`	[`~pipelines.ImagePipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.ImagePipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (img_height and img_width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

`RBLNKandinskyV22InpaintPipeline` ¶

Pipeline for image inpainting with given priors using Kandinsky V2.2 optimized for RBLN NPUs.

This pipeline inherits from:

[KandinskyV22InpaintPipeline] for the core image inpainting functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (unet, movq) Image generation settings: batch_size, img_height, img_width, guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs.	`{}`

Examples:

Using default configurations:

pipeline = RBLNKandinskyV22InpaintPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNKandinskyV22InpaintPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    export=True,                      # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",           # Target NPU model
        "device": 0,                  # Target NPU device ID
        "create_runtimes": True,      # If False, only compiles model without loading to NPU
        "activate_profiler": True,    # If False, only inference model without profiling
        "batch_size": 1,              # Global batch size
        "img_height": 512,            # Image height
        "img_width": 512,             # Image width
        "guidance_scale": 7.5,        # Classifier-free guidance scale to determine batch size of unet
        "unet": {                     # UNet-specific settings
            "batch_size": 2           # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNKandinskyV22InpaintPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_unet={"batch_size": 2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Kandinsky V2.2 inpaint pipeline.

The function works similarly to the original KandinskyV22InpaintPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`ImagePipelineOutput`	[`~pipelines.ImagePipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.ImagePipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (img_height and img_width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

`RBLNKandinskyV22CombinedPipeline` ¶

Combined pipeline for text-to-image generation using Kandinsky V2.2 optimized for RBLN NPUs.

This pipeline inherits from:

[KandinskyV22CombinedPipeline] for the core text-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (prior_text_encoder, prior_image_encoder, prior_prior, unet, movq) Image generation settings: batch_size, img_height, img_width, guidance_scale, prior_guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original Kandinsky V2.2 pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs (e.g., rbln_npu="RBLN-CA22" is equivalent to rbln_config={"npu": "RBLN-CA22"}).	`{}`

Examples:

Using default configurations:

pipeline = RBLNKandinskyV22CombinedPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNKandinskyV22CombinedPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder"
    export=True,                       # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",            # Target NPU model. If not specified, uses the installed NPU
        "device": 0,                   # Target NPU device ID. Defaults to 0
        "create_runtimes": True,       # If False, only compiles model without loading to NPU
        "activate_profiler": True,     # If False, only inference model without profiling
        "batch_size": 1,               # Global batch size
        "img_height": 768,             # Image height
        "img_width": 768,              # Image width
        "guidance_scale": 7.5,         # Classifier-free guidance scale to determine batch size of unet
        "prior_pipe": {                # Prior pipeline-specific settings
            prior: {                   # Prior prior-specific settings
                "guidance_scale": 4.0  # Override guidance scale for prior in prior pipeline
            }                          # Same with `"prior_prior": {"guidance_scale": 4.0}` in "rbln_config"
        },
        "unet": {                      # UNet-specific settings
            "batch_size": 2            # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNKandinskyV22CombinedPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=768,
    rbln_img_width=768,
    rbln_guidance_scale=7.5,
    rbln_prior_prior={"guidance_scale":4.0},
    rbln_unet={"batch_size":2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Kandinsky V2.2 combined pipeline.

The function works similarly to the original KandinskyV22CombinedPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`ImagePipelineOutput`	[`~pipelines.ImagePipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.ImagePipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (img_height and img_width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

`RBLNKandinskyV22Img2ImgCombinedPipeline` ¶

Combined pipeline for image-to-image generation using Kandinsky V2.2 optimized for RBLN NPUs.

This pipeline inherits from:

[KandinskyV22Img2ImgCombinedPipeline] for the core image-to-image functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (prior_text_encoder, prior_image_encoder, prior_prior, unet, movq) Image generation settings: batch_size, img_height, img_width, guidance_scale, prior_guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original Kandinsky V2.2 pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs (e.g., rbln_npu="RBLN-CA22" is equivalent to rbln_config={"npu": "RBLN-CA22"}).	`{}`

Examples:

Using default configurations:

pipeline = RBLNKandinskyV22Img2ImgCombinedPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNKandinskyV22Img2ImgCombinedPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder"
    export=True,                       # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",            # Target NPU model. If not specified, uses the installed NPU
        "device": 0,                   # Target NPU device ID. Defaults to 0
        "create_runtimes": True,       # If False, only compiles model without loading to NPU
        "activate_profiler": True,     # If False, only inference model without profiling
        "batch_size": 1,               # Global batch size
        "img_height": 768,             # Image height
        "img_width": 768,              # Image width
        "guidance_scale": 7.5,         # Classifier-free guidance scale to determine batch size of unet
        "prior_pipe": {                # Prior pipeline-specific settings
            prior: {                   # Prior prior-specific settings
                "guidance_scale": 4.0  # Override guidance scale for prior in prior pipeline
            }                          # Same with `"prior_prior": {"guidance_scale": 4.0}` in "rbln_config"
        },
        "unet": {                      # UNet-specific settings
            "batch_size": 2            # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNKandinskyV22Img2ImgCombinedPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=768,
    rbln_img_width=768,
    rbln_guidance_scale=7.5,
    rbln_prior_prior={"guidance_scale":4.0},
    rbln_unet={"batch_size":2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Kandinsky n2.2 img2img combined pipeline.

The function works similarly to the original KandinskyV22Img2ImgCombinedPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`ImagePipelineOutput`	[`~pipelines.ImagePipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.ImagePipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (img_height and img_width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

`RBLNKandinskyV22InpaintCombinedPipeline` ¶

Combined pipeline for inpainting generation using Kandinsky V2.2 optimized for RBLN NPUs.

This pipeline inherits from:

[KandinskyV22InpaintCombinedPipeline] for the core inpainting generation functionality
[RBLNDiffusionMixin] for RBLN NPU optimization capabilities

The pipeline provides NPU-optimized inference by:

Converting the original PyTorch model components into optimized RBLN graphs
Compiling these graphs for efficient execution on RBLN NPUs
Optionally fusing LoRA weights during compilation for customized generation

Functions¶

from_pretrained(model_id, export=False, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace diffusers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	Can be either: A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. A path to a directory containing a model saved using `save_pretrained`,	required
`export`	`bool`	A boolean flag that controls model compilation behavior: If True: Compiles PyTorch modules into RBLN format and loads them. Use this when loading the model for the first time or when model configuration has changed. If False: Assumes submodules are already compiled and loads pre-compiled RBLN modules. Use this for faster loading when the model has been previously compiled with the same configuration.	`False`
`rbln_config`	`Dict[str, Any]`	Configuration for RBLN compilation. Can be specified either as a dictionary or using rbln_* kwargs. Common configurations include: Global settings: npu, device, create_runtimes, activate_profiler Per-component settings under keys matching component names (prior_text_encoder, prior_image_encoder, prior_prior, unet, movq) Image generation settings: batch_size, img_height, img_width, guidance_scale, prior_guidance_scale Defaults to empty dict.	`{}`
`lora_ids`	`str \| List[str]`	Single LoRA model ID or list of model IDs from Hugging Face Hub to fuse during compilation. Only used when export=True.	`None`
`lora_weights_names`	`str \| List[str]`	Names of the LoRA weight files within their repositories. Must match the length of lora_ids if provided.	`None`
`lora_scales`	`float \| List[float]`	Scaling factors for the LoRA adapters. Must match the length of lora_ids if provided.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the original Kandinsky V2.2 pipeline's from_pretrained method. RBLN configurations can also be specified using rbln_* prefixed kwargs (e.g., rbln_npu="RBLN-CA22" is equivalent to rbln_config={"npu": "RBLN-CA22"}).	`{}`

Examples:

Using default configurations:

pipeline = RBLNKandinskyV22InpaintCombinedPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    export=True,
    # When rbln_config is not provided, the pipeline automatically determines appropriate default values
    # for all necessary configurations (image size, NPU settings, etc.) based on the model's
    # configuration and runtime environment
)

Using rbln_config dictionary:

pipeline = RBLNKandinskyV22InpaintCombinedPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint"
    export=True,                       # Compile PyTorch modules to RBLN format
    rbln_config={
        "npu": "RBLN-CA02",            # Target NPU model. If not specified, uses the installed NPU
        "device": 0,                   # Target NPU device ID. Defaults to 0
        "create_runtimes": True,       # If False, only compiles model without loading to NPU
        "activate_profiler": True,     # If False, only inference model without profiling
        "batch_size": 1,               # Global batch size
        "img_height": 512,             # Image height
        "img_width": 512,              # Image width
        "guidance_scale": 7.5,         # Classifier-free guidance scale to determine batch size of unet
        "prior_pipe": {                # Prior pipeline-specific settings
            prior: {                   # Prior prior-specific settings
                "guidance_scale": 4.0  # Override guidance scale for prior in prior pipeline
            }                          # Same with `"prior_prior": {"guidance_scale": 4.0}` in "rbln_config"
        },
        "unet": {                      # UNet-specific settings
            "batch_size": 2            # Override batch size for UNet
        },
    }
)

Using rbln_* kwargs:

pipeline = RBLNKandinskyV22InpaintCombinedPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder-inpaint",
    export=True,
    rbln_npu="RBLN-CA02",
    rbln_device=0,
    rbln_create_runtimes=True,
    rbln_activate_profiler=True,
    rbln_batch_size=1,
    rbln_img_height=512,
    rbln_img_width=512,
    rbln_guidance_scale=7.5,
    rbln_prior_prior={"guidance_scale":4.0},
    rbln_unet={"batch_size":2}
)

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

__call__(*args, **kwargs) ¶

Generate images using the RBLN-optimized Kandinsky V2.2 inpaint combined pipeline.

The function works similarly to the original KandinskyV22InpaintCombinedPipeline.call method, but with optimizations for RBLN NPU inference.

Returns:

Type	Description
`ImagePipelineOutput`	[`~pipelines.ImagePipelineOutput`] or `tuple`: If return_dict is True, a [`~pipelines.ImagePipelineOutput`] is returned, otherwise a tuple is returned where the first element is a list of generated images.

Note

There are several compile-time configurations that cannot be modified during inference:

Image dimensions (img_height and img_width) must be specified during model compilation
LoRA scales must be set during compilation using the from_pretrained method
Any attempt to adjust these parameters during inference will be ignored

Audio¶

Classes¶

`RBLNASTForAudioClassification` ¶

Audio Spectrogram Transformer model with an audio classification head on top (a linear layer on top of the pooled output) e.g. for datasets like AudioSet, Speech Commands v2. This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models.

A class to convert and run pre-trained transformer-based ASTForAudioClassification models on RBLN devices. It implements the methods to convert a pre-trained transformers ASTForAudioClassification model into a RBLN transformer model by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN Compiler.

Currently, this model class only supports the 'AST' model from the transformers library. Future updates may include support for additional model types.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNWav2Vec2ForCTC` ¶

Wav2Vec2 Model with a language modeling head on top for Connectionist Temporal Classification (CTC).

This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its model.

It implements the methods to convert a pre-trained Wav2Vec2ForCTC into a RBLNWav2Vec2ForCTC by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN Compiler.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNWhisperForConditionalGeneration` ¶

The Whisper Model with a language modeling head. Can be used for automatic speech recognition.

This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models.

A class to convert and run pre-trained transformer-based WhisperForConditionalGeneration model on RBLN devices. It implements the methods to convert a pre-trained transformers WhisperForConditionalGeneration into a RBLNWhisperForConditionalGeneration by:

transferring the checkpoint weights of the original into an optimized RBLN graph,
compiling the resulting graph using the RBLN Compiler.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_token_timestamps=False, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_token_timestamps`		A boolean flag to compile the model for generating word-level timestamps during inference.	`False`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

generate(input_features, return_timestamps=None, task=None, language=None, is_multilingual=None, attention_mask=None, return_token_timestamps=None, return_segments=False, return_dict_in_generate=None)

¶

The generate function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to generate text from the model.

Parameters:

Name	Type	Description	Default
`input_features`	`Tensor`	Float values of log-mel features extracted from the raw speech waveform.	required
`return_timestamps`	`Optional[bool]`	Whether to return the timestamps with the text.	`None`
`task`	`Optional[str]`	Task to use for generation, either "translate" or "transcribe". The `model.config.forced_decoder_ids` will be updated accordingly.	`None`
`language`	`Optional[Union[str, List[str]]]`	Language token to use for generation, can be either in the form of `<\|en\|>`, `en` or `english`.	`None`
`is_multilingual`	`Optional[bool]`	Whether or not the model is multilingual.	`None`
`attention_mask`	`Optional[Tensor]`	`attention_mask` needs to be passed when doing long-form transcription using a batch size > 1.	`None`
`return_token_timestamps`	`Optional[bool]`	Whether to return token-level timestamps with the text. This can be used with or without the `return_timestamps` option.	`None`
`return_segments`	`bool`	Whether to additionally return a list of all segments. Note that this option can only be enabled when doing long-form transcription.	`False`
`return_dict_in_generate`	`Optional[bool]`	Whether or not to return a [`~utils.ModelOutput`] instead of just returning the generated tokens. Note that when doing long-form transcription, `return_dict_in_generate` can only be enabled when `return_segments` is set to True. In this case the generation output of each segment is added to each segment.	`None`

Returns: torch.Tensor

Computer Vision¶

Classes¶

`RBLNDPTForDepthEstimation` ¶

DPT Model with a depth estimation head on top (consisting of 3 convolutional layers). This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_image_size=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_image_size`	`Optional[Union[int, List[int]]]`	The size of the image.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNResNetForImageClassification` ¶

ResNet Model with an image classification head on top (a linear layer on top of the pooled features), e.g. for ImageNet. This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_image_size=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_image_size`	`Optional[Union[int, List[int]]]`	The size of the image.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

`RBLNViTImageClassification` ¶

ViT Model transformer with an image classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for ImageNet. This model inherits from [RBLNModel]. Check the superclass documentation for the generic methods the library implements for all its models.

Functions¶

from_pretrained(model_id, export=False, rbln_npu=None, rbln_device=0, rbln_create_runtimes=None, rbln_batch_size=1, rbln_image_size=None, rbln_activate_profiler=None)

classmethod ¶

The from_pretrained() function is utilized in its standard form as in the HuggingFace transformers library. User can use this function to load a pre-trained model from the library.

Parameters:

Name	Type	Description	Default
`model_id`	`Union[str, Path]`	The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN compiler.	required
`export`	`bool`	A boolean flag to indicate if the model should be exported to a `.rbln` file.	`False`
`rbln_npu`	`Optional[str]`	The name of the NPU to be used. If not specified, the NPU installed on the host machine is used. If no NPU is installed on the host machine, an error occurs.	`None`
`rbln_device`	`Optional[int]`	The device to be used at runtime. If not specified, device 0 is used.	`0`
`rbln_create_runtimes`	`Optional[bool]`	A flag to indicate whether to create runtime objects. If False, the runtime does not load the model onto the NPU. This option is particularly useful when you want to perform compilation only on a host machine without an NPU.	`None`
`rbln_batch_size`	`Optional[int]`	The batch size of the model.	`1`
`rbln_image_size`	`Optional[Union[int, List[int]]]`	The size of the image.	`None`
`rbln_activate_profiler`	`Optional[bool]`	A flag to indicate whether to activate `RBLN Profiler` when you inference the model.	`None`

save_pretrained(save_directory) ¶

Saves a model and its configuration file to a directory, so that it can be re-loaded using the [from_pretrained] class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, PathLike]`	The directory to save the model and its configuration files. Will be created if it doesn't exist.	required

Model API¶

optimum API¶

Generic model classes¶

Classes¶

RBLNBaseModel ¶

Functions¶

RBLNModel ¶

Natural Language Processing¶

Classes¶

RBLNLlamaForCausalLM ¶

Functions¶

RBLNGemmaForCausalLM ¶

Functions¶

RBLNMistralForCausalLM ¶

Functions¶

RBLNQwen2ForCausalLM ¶

Functions¶

RBLNExaoneForCausalLM ¶

Functions¶

RBLNPhiForCausalLM ¶

Functions¶

RBLNGPT2LMHeadModel ¶

Functions¶

RBLNT5ForConditionalGeneration ¶

Functions¶

RBLNBartForConditionalGeneration ¶

Functions¶

RBLNBertForQuestionAnswering ¶

Functions¶

RBLNDistilBertForQuestionAnswering ¶

Functions¶

RBLNBertForMaskedLM ¶

Functions¶

RBLNMidmLMHeadModel ¶

Functions¶

RBLNRobertaForMaskedLM ¶

Functions¶

RBLNRobertaForSequenceClassification ¶

Functions¶

RBLNXLMRobertaModel ¶

Functions¶

RBLNXLMRobertaForSequenceClassification ¶

Functions¶

RBLNBertModel ¶

Functions¶

RBLNBartModel ¶

Functions¶

RBLNT5EncoderModel ¶

Functions¶

Multi Modal¶

Classes¶

RBLNLlavaNextForConditionalGeneration ¶

Functions¶

Stable Diffusion¶

Classes¶

RBLNStableDiffusionPipeline ¶

Functions¶

RBLNStableDiffusionImg2ImgPipeline ¶

Functions¶

RBLNStableDiffusionInpaintPipeline ¶

Functions¶

RBLNStableDiffusionControlNetPipeline ¶

Functions¶

RBLNStableDiffusionControlNetImg2ImgPipeline ¶

Functions¶

RBLNStableDiffusion3Pipeline ¶

Functions¶

RBLNStableDiffusion3Img2ImgPipeline ¶

Functions¶

RBLNStableDiffusion3InpaintPipeline ¶

Functions¶

RBLNStableDiffusionXLPipeline ¶

Functions¶

RBLNStableDiffusionXLImg2ImgPipeline ¶

Functions¶

RBLNStableDiffusionXLInpaintPipeline ¶

Functions¶

RBLNStableDiffusionXLControlNetPipeline ¶

Functions¶

RBLNStableDiffusionXLControlNetImg2ImgPipeline ¶

`optimum API`¶

`RBLNBaseModel` ¶

`RBLNModel` ¶

`RBLNLlamaForCausalLM` ¶

`RBLNGemmaForCausalLM` ¶

`RBLNMistralForCausalLM` ¶

`RBLNQwen2ForCausalLM` ¶

`RBLNExaoneForCausalLM` ¶

`RBLNPhiForCausalLM` ¶

`RBLNGPT2LMHeadModel` ¶

`RBLNT5ForConditionalGeneration` ¶

`RBLNBartForConditionalGeneration` ¶

`RBLNBertForQuestionAnswering` ¶

`RBLNDistilBertForQuestionAnswering` ¶

`RBLNBertForMaskedLM` ¶

`RBLNMidmLMHeadModel` ¶

`RBLNRobertaForMaskedLM` ¶

`RBLNRobertaForSequenceClassification` ¶

`RBLNXLMRobertaModel` ¶

`RBLNXLMRobertaForSequenceClassification` ¶

`RBLNBertModel` ¶

`RBLNBartModel` ¶

`RBLNT5EncoderModel` ¶

`RBLNLlavaNextForConditionalGeneration` ¶

`RBLNStableDiffusionPipeline` ¶

`RBLNStableDiffusionImg2ImgPipeline` ¶

`RBLNStableDiffusionInpaintPipeline` ¶

`RBLNStableDiffusionControlNetPipeline` ¶

`RBLNStableDiffusionControlNetImg2ImgPipeline` ¶

`RBLNStableDiffusion3Pipeline` ¶

`RBLNStableDiffusion3Img2ImgPipeline` ¶

`RBLNStableDiffusion3InpaintPipeline` ¶

`RBLNStableDiffusionXLPipeline` ¶

`RBLNStableDiffusionXLImg2ImgPipeline` ¶

`RBLNStableDiffusionXLInpaintPipeline` ¶

`RBLNStableDiffusionXLControlNetPipeline` ¶

`RBLNStableDiffusionXLControlNetImg2ImgPipeline` ¶

`RBLNKandinskyV22PriorPipeline` ¶

`RBLNKandinskyV22Pipeline` ¶

`RBLNKandinskyV22Img2ImgPipeline` ¶

`RBLNKandinskyV22InpaintPipeline` ¶

`RBLNKandinskyV22CombinedPipeline` ¶

`RBLNKandinskyV22Img2ImgCombinedPipeline` ¶

`RBLNKandinskyV22InpaintCombinedPipeline` ¶

`RBLNASTForAudioClassification` ¶

`RBLNWav2Vec2ForCTC` ¶

`RBLNWhisperForConditionalGeneration` ¶

`RBLNDPTForDepthEstimation` ¶

`RBLNResNetForImageClassification` ¶

`RBLNViTImageClassification` ¶