Kandinsky V2.2¶
Kandinsky V2.2는 텍스트-이미지 잠재 확산 모델입니다. RBLN NPU는 Optimum RBLN을 사용하여 Kandinsky V2.2 파이프라인을 가속화할 수 있습니다.
지원하는 파이프라인¶
Optimum RBLN은 여러 Kandinsky V2.2 파이프라인을 지원합니다:
- 텍스트-이미지 변환(Text-to-Image): 텍스트 프롬프트에서 이미지 생성 (Prior + Decoder 사용)
- 이미지-이미지 변환(Image-to-Image): 텍스트 프롬프트를 기반으로 기존 이미지 수정 (Prior + Img2Img Decoder 사용)
- 인페인팅(Inpainting): 텍스트 프롬프트에 따라 이미지의 마스킹된 영역 채우기 (Prior + Inpaint Decoder 사용)
중요: 가이던스 스케일을 위한 배치 크기 설정¶
배치 크기와 가이던스 스케일
Kandinsky V2.2를 가이던스 스케일 > 1.0(기본값)으로 사용할 때, UNet과 Prior 두 구성 요소의 실제 배치 크기는 클래스 없는 가이던스 기법으로 인해 런타임 중에 두 배가 됩니다.
RBLN NPU는 정적 그래프 컴파일을 사용하기 때문에, 이러한 구성 요소의 컴파일 시 배치 크기가 런타임 배치 크기와 일치해야 합니다. 그렇지 않으면 추론 중에 오류가 발생합니다.
기본 동작¶
UNet이나 Prior의 배치 크기를 명시적으로 지정하지 않으면 Optimum RBLN은 다음과 같이 동작합니다:
- 기본 가이던스 스케일(1.0보다 큼)을 사용할 것으로 가정
- UNet과 Prior의 배치 크기를 파이프라인 배치 크기의 2배로 자동 설정
기본 가이던스 스케일을 사용할 계획이라면 이 자동 설정이 정상적으로 작동합니다. 그러나 다른 가이던스 스케일을 사용하거나 더 많은 제어가 필요한 경우, 배치 크기를 명시적으로 설정해야 합니다.
예제: 명시적으로 배치 크기 설정하기 (가이던스 스케일 = 1.0)¶
정확히 가이던스 스케일 1.0을 사용할 계획이라면(클래스 없는 가이던스를 사용하지 않음), 배치 크기를 추론 배치 크기와 일치하도록 명시적으로 설정해야 합니다:
사용 예제¶
방법 1: 개별 Prior 및 Decoder 파이프라인 사용¶
이 방식은 중간 이미지 임베딩에 대한 더 많은 제어가 가능합니다:
방법 2: 결합 파이프라인 사용¶
결합 파이프라인은 Prior와 Decoder를 하나의 원활한 워크플로우로 통합합니다:
API 참조¶
Classes¶
RBLNKandinskyV22Pipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22Pipeline
RBLN-accelerated implementation of Kandinsky 2.2 pipeline for text-to-image generation.
This pipeline compiles Kandinsky 2.2 models to run efficiently on RBLN NPUs, enabling high-performance inference for generating images with distinctive artistic style and enhanced visual quality.
Functions¶
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Classes¶
RBLNKandinskyV22PriorPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22PriorPipeline
RBLN-accelerated implementation of Kandinsky 2.2 prior pipeline for text and image embedding generation.
This pipeline compiles Kandinsky 2.2 prior models to run efficiently on RBLN NPUs, enabling high-performance inference for generating image embeddings from text prompts and image inputs for downstream generation tasks.
Functions¶
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Classes¶
RBLNKandinskyV22Img2ImgPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22Img2ImgPipeline
RBLN-accelerated implementation of Kandinsky 2.2 pipeline for image-to-image generation.
This pipeline compiles Kandinsky 2.2 models to run efficiently on RBLN NPUs, enabling high-performance inference for transforming input images with distinctive artistic style and enhanced visual fidelity.
Functions¶
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Classes¶
RBLNKandinskyV22InpaintPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22InpaintPipeline
RBLN-accelerated implementation of Kandinsky 2.2 pipeline for image inpainting.
This pipeline compiles Kandinsky 2.2 models to run efficiently on RBLN NPUs, enabling high-performance inference for filling masked regions with distinctive artistic style and seamless content integration.
Functions¶
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Classes¶
RBLNKandinskyV22CombinedPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22CombinedPipeline
RBLN-accelerated implementation of Kandinsky 2.2 combined pipeline for end-to-end text-to-image generation.
This pipeline compiles both prior and decoder Kandinsky 2.2 models to run efficiently on RBLN NPUs, enabling high-performance inference for complete text-to-image generation with distinctive artistic style.
Functions¶
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
RBLNKandinskyV22Img2ImgCombinedPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22Img2ImgCombinedPipeline
RBLN-accelerated implementation of Kandinsky 2.2 combined pipeline for end-to-end image-to-image generation.
This pipeline compiles both prior and decoder Kandinsky 2.2 models to run efficiently on RBLN NPUs, enabling high-performance inference for complete image-to-image transformation with distinctive artistic style.
Functions¶
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
RBLNKandinskyV22InpaintCombinedPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22InpaintCombinedPipeline
RBLN-accelerated implementation of Kandinsky 2.2 combined pipeline for end-to-end image inpainting.
This pipeline compiles both prior and decoder Kandinsky 2.2 models to run efficiently on RBLN NPUs, enabling high-performance inference for complete image inpainting with distinctive artistic style and seamless integration.
Functions¶
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Classes¶
RBLNKandinskyV22PipelineBaseConfig
¶
Bases: RBLNModelConfig
Functions¶
__init__(unet=None, movq=None, *, sample_size=None, batch_size=None, guidance_scale=None, image_size=None, img_height=None, img_width=None, height=None, width=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Configuration for the MoVQ (VQ-GAN) model component. Initialized as RBLNVQModelConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNKandinskyV22PipelineConfig
¶
Bases: RBLNKandinskyV22PipelineBaseConfig
Configuration class for the Kandinsky V2.2 text-to-image decoder pipeline.
Functions¶
__init__(unet=None, movq=None, *, sample_size=None, batch_size=None, guidance_scale=None, image_size=None, img_height=None, img_width=None, height=None, width=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Configuration for the MoVQ (VQ-GAN) model component. Initialized as RBLNVQModelConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNKandinskyV22Img2ImgPipelineConfig
¶
Bases: RBLNKandinskyV22PipelineBaseConfig
Configuration class for the Kandinsky V2.2 image-to-image decoder pipeline.
Functions¶
__init__(unet=None, movq=None, *, sample_size=None, batch_size=None, guidance_scale=None, image_size=None, img_height=None, img_width=None, height=None, width=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Configuration for the MoVQ (VQ-GAN) model component. Initialized as RBLNVQModelConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNKandinskyV22InpaintPipelineConfig
¶
Bases: RBLNKandinskyV22PipelineBaseConfig
Configuration class for the Kandinsky V2.2 inpainting decoder pipeline.
Functions¶
__init__(unet=None, movq=None, *, sample_size=None, batch_size=None, guidance_scale=None, image_size=None, img_height=None, img_width=None, height=None, width=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Configuration for the MoVQ (VQ-GAN) model component. Initialized as RBLNVQModelConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNKandinskyV22PriorPipelineConfig
¶
Bases: RBLNModelConfig
Configuration class for the Kandinsky V2.2 Prior pipeline.
Functions¶
__init__(text_encoder=None, image_encoder=None, prior=None, *, batch_size=None, guidance_scale=None, **kwargs)
¶
Initialize a configuration for Kandinsky 2.2 prior pipeline optimized for RBLN NPU.
This configuration sets up the prior components of the Kandinsky 2.2 architecture, which includes text and image encoders along with a prior transformer that maps text/image embeddings to latent representations used to condition the diffusion process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Configuration for the text encoder component. Initialized as RBLNCLIPTextModelWithProjectionConfig if not provided. |
None
|
image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Configuration for the image encoder component. Initialized as RBLNCLIPVisionModelWithProjectionConfig if not provided. |
None
|
prior
|
Optional[RBLNPriorTransformerConfig]
|
Configuration for the prior transformer component. Initialized as RBLNPriorTransformerConfig if not provided. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Note
When guidance_scale > 1.0, the prior batch size is automatically doubled to accommodate classifier-free guidance.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNKandinskyV22CombinedPipelineBaseConfig
¶
Bases: RBLNModelConfig
Base configuration class for Kandinsky V2.2 combined pipelines.
Functions¶
__init__(prior_pipe=None, decoder_pipe=None, *, sample_size=None, image_size=None, batch_size=None, img_height=None, img_width=None, height=None, width=None, guidance_scale=None, prior_prior=None, prior_image_encoder=None, prior_text_encoder=None, unet=None, movq=None, **kwargs)
¶
Initialize a configuration for combined Kandinsky 2.2 pipelines optimized for RBLN NPU.
This configuration integrates both the prior and decoder components of Kandinsky 2.2 into a unified pipeline, allowing for end-to-end text-to-image generation in a single model. It combines the text/image encoding, prior mapping, and diffusion steps together.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prior_pipe
|
Optional[RBLNKandinskyV22PriorPipelineConfig]
|
Configuration for the prior pipeline. Initialized as RBLNKandinskyV22PriorPipelineConfig if not provided. |
None
|
decoder_pipe
|
Optional[RBLNKandinskyV22PipelineConfig]
|
Configuration for the decoder pipeline. Initialized as RBLNKandinskyV22PipelineConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
prior_prior
|
Optional[RBLNPriorTransformerConfig]
|
Direct configuration for the prior transformer. Used if prior_pipe is not provided. |
None
|
prior_image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Direct configuration for the image encoder. Used if prior_pipe is not provided. |
None
|
prior_text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Direct configuration for the text encoder. Used if prior_pipe is not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Direct configuration for the UNet. Used if decoder_pipe is not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Direct configuration for the MoVQ (VQ-GAN) model. Used if decoder_pipe is not provided. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNKandinskyV22CombinedPipelineConfig
¶
Bases: RBLNKandinskyV22CombinedPipelineBaseConfig
Configuration class for the Kandinsky V2.2 combined text-to-image pipeline.
Functions¶
__init__(prior_pipe=None, decoder_pipe=None, *, sample_size=None, image_size=None, batch_size=None, img_height=None, img_width=None, height=None, width=None, guidance_scale=None, prior_prior=None, prior_image_encoder=None, prior_text_encoder=None, unet=None, movq=None, **kwargs)
¶
Initialize a configuration for combined Kandinsky 2.2 pipelines optimized for RBLN NPU.
This configuration integrates both the prior and decoder components of Kandinsky 2.2 into a unified pipeline, allowing for end-to-end text-to-image generation in a single model. It combines the text/image encoding, prior mapping, and diffusion steps together.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prior_pipe
|
Optional[RBLNKandinskyV22PriorPipelineConfig]
|
Configuration for the prior pipeline. Initialized as RBLNKandinskyV22PriorPipelineConfig if not provided. |
None
|
decoder_pipe
|
Optional[RBLNKandinskyV22PipelineConfig]
|
Configuration for the decoder pipeline. Initialized as RBLNKandinskyV22PipelineConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
prior_prior
|
Optional[RBLNPriorTransformerConfig]
|
Direct configuration for the prior transformer. Used if prior_pipe is not provided. |
None
|
prior_image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Direct configuration for the image encoder. Used if prior_pipe is not provided. |
None
|
prior_text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Direct configuration for the text encoder. Used if prior_pipe is not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Direct configuration for the UNet. Used if decoder_pipe is not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Direct configuration for the MoVQ (VQ-GAN) model. Used if decoder_pipe is not provided. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNKandinskyV22InpaintCombinedPipelineConfig
¶
Bases: RBLNKandinskyV22CombinedPipelineBaseConfig
Configuration class for the Kandinsky V2.2 combined inpainting pipeline.
Functions¶
__init__(prior_pipe=None, decoder_pipe=None, *, sample_size=None, image_size=None, batch_size=None, img_height=None, img_width=None, height=None, width=None, guidance_scale=None, prior_prior=None, prior_image_encoder=None, prior_text_encoder=None, unet=None, movq=None, **kwargs)
¶
Initialize a configuration for combined Kandinsky 2.2 pipelines optimized for RBLN NPU.
This configuration integrates both the prior and decoder components of Kandinsky 2.2 into a unified pipeline, allowing for end-to-end text-to-image generation in a single model. It combines the text/image encoding, prior mapping, and diffusion steps together.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prior_pipe
|
Optional[RBLNKandinskyV22PriorPipelineConfig]
|
Configuration for the prior pipeline. Initialized as RBLNKandinskyV22PriorPipelineConfig if not provided. |
None
|
decoder_pipe
|
Optional[RBLNKandinskyV22PipelineConfig]
|
Configuration for the decoder pipeline. Initialized as RBLNKandinskyV22PipelineConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
prior_prior
|
Optional[RBLNPriorTransformerConfig]
|
Direct configuration for the prior transformer. Used if prior_pipe is not provided. |
None
|
prior_image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Direct configuration for the image encoder. Used if prior_pipe is not provided. |
None
|
prior_text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Direct configuration for the text encoder. Used if prior_pipe is not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Direct configuration for the UNet. Used if decoder_pipe is not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Direct configuration for the MoVQ (VQ-GAN) model. Used if decoder_pipe is not provided. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNKandinskyV22Img2ImgCombinedPipelineConfig
¶
Bases: RBLNKandinskyV22CombinedPipelineBaseConfig
Configuration class for the Kandinsky V2.2 combined image-to-image pipeline.
Functions¶
__init__(prior_pipe=None, decoder_pipe=None, *, sample_size=None, image_size=None, batch_size=None, img_height=None, img_width=None, height=None, width=None, guidance_scale=None, prior_prior=None, prior_image_encoder=None, prior_text_encoder=None, unet=None, movq=None, **kwargs)
¶
Initialize a configuration for combined Kandinsky 2.2 pipelines optimized for RBLN NPU.
This configuration integrates both the prior and decoder components of Kandinsky 2.2 into a unified pipeline, allowing for end-to-end text-to-image generation in a single model. It combines the text/image encoding, prior mapping, and diffusion steps together.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prior_pipe
|
Optional[RBLNKandinskyV22PriorPipelineConfig]
|
Configuration for the prior pipeline. Initialized as RBLNKandinskyV22PriorPipelineConfig if not provided. |
None
|
decoder_pipe
|
Optional[RBLNKandinskyV22PipelineConfig]
|
Configuration for the decoder pipeline. Initialized as RBLNKandinskyV22PipelineConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
prior_prior
|
Optional[RBLNPriorTransformerConfig]
|
Direct configuration for the prior transformer. Used if prior_pipe is not provided. |
None
|
prior_image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Direct configuration for the image encoder. Used if prior_pipe is not provided. |
None
|
prior_text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Direct configuration for the text encoder. Used if prior_pipe is not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Direct configuration for the UNet. Used if decoder_pipe is not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Direct configuration for the MoVQ (VQ-GAN) model. Used if decoder_pipe is not provided. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.