Kandinsky V2.2¶
Kandinsky V2.2는 텍스트-이미지 잠재 확산 모델입니다. RBLN NPU는 Optimum RBLN을 사용하여 Kandinsky V2.2 파이프라인을 가속화할 수 있습니다.
지원하는 파이프라인¶
Optimum RBLN은 여러 Kandinsky V2.2 파이프라인을 지원합니다:
- 텍스트-이미지 변환(Text-to-Image): 텍스트 프롬프트에서 이미지 생성 (Prior + Decoder 사용)
- 이미지-이미지 변환(Image-to-Image): 텍스트 프롬프트를 기반으로 기존 이미지 수정 (Prior + Img2Img Decoder 사용)
- 인페인팅(Inpainting): 텍스트 프롬프트에 따라 이미지의 마스킹된 영역 채우기 (Prior + Inpaint Decoder 사용)
주요 클래스¶
RBLNKandinskyV22PriorPipeline
: Prior 단계 파이프라인 (텍스트/이미지 -> 이미지 임베딩)RBLNKandinskyV22PriorPipelineConfig
: Prior 파이프라인 설정RBLNKandinskyV22Pipeline
: 텍스트-이미지 Decoder 파이프라인 (이미지 임베딩 -> 이미지)RBLNKandinskyV22PipelineConfig
: 텍스트-이미지 Decoder 파이프라인 설정RBLNKandinskyV22Img2ImgPipeline
: 이미지-이미지 Decoder 파이프라인RBLNKandinskyV22Img2ImgPipelineConfig
: 이미지-이미지 Decoder 파이프라인 설정RBLNKandinskyV22InpaintPipeline
: 인페인팅 Decoder 파이프라인RBLNKandinskyV22InpaintPipelineConfig
: 인페인팅 Decoder 파이프라인 설정RBLNKandinskyV22CombinedPipeline
: 결합 파이프라인 (Prior + 텍스트-이미지 Decoder)RBLNKandinskyV22CombinedPipelineConfig
: 결합 파이프라인 설정RBLNKandinskyV22Img2ImgCombinedPipeline
: 결합 파이프라인 (Prior + 이미지-이미지 Decoder)RBLNKandinskyV22Img2ImgCombinedPipelineConfig
: 결합 파이프라인 (Prior + 이미지-이미지 Decoder) 설정RBLNKandinskyV22InpaintCombinedPipeline
: 결합 파이프라인 (Prior + 인페인팅 Decoder)RBLNKandinskyV22InpaintCombinedPipelineConfig
: 결합 파이프라인 (Prior + 인페인팅 Decoder) 설정
중요: 가이던스 스케일을 위한 배치 크기 설정¶
배치 크기와 가이던스 스케일
Kandinsky V2.2를 가이던스 스케일 > 1.0(기본값)으로 사용할 때, UNet과 Prior 두 구성 요소의 실제 배치 크기는 클래스 없는 가이던스 기법으로 인해 런타임 중에 두 배가 됩니다.
RBLN NPU는 정적 그래프 컴파일을 사용하기 때문에, 이러한 구성 요소의 컴파일 시 배치 크기가 런타임 배치 크기와 일치해야 합니다. 그렇지 않으면 추론 중에 오류가 발생합니다.
기본 동작¶
UNet이나 Prior의 배치 크기를 명시적으로 지정하지 않으면 Optimum RBLN은 다음과 같이 동작합니다:
- 기본 가이던스 스케일(1.0보다 큼)을 사용할 것으로 가정
- UNet과 Prior의 배치 크기를 파이프라인 배치 크기의 2배로 자동 설정
기본 가이던스 스케일을 사용할 계획이라면 이 자동 설정이 정상적으로 작동합니다. 그러나 다른 가이던스 스케일을 사용하거나 더 많은 제어가 필요한 경우, 배치 크기를 명시적으로 설정해야 합니다.
예제: 명시적으로 배치 크기 설정하기 (가이던스 스케일 = 1.0)¶
정확히 가이던스 스케일 1.0을 사용할 계획이라면(클래스 없는 가이던스를 사용하지 않음), 배치 크기를 추론 배치 크기와 일치하도록 명시적으로 설정해야 합니다:
사용 예제¶
방법 1: 개별 Prior 및 Decoder 파이프라인 사용¶
이 방식은 중간 이미지 임베딩에 대한 더 많은 제어가 가능합니다:
방법 2: 결합 파이프라인 사용¶
결합 파이프라인은 Prior와 Decoder를 하나의 원활한 워크플로우로 통합합니다:
API 참조¶
Classes¶
RBLNKandinskyV22PriorPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22PriorPipeline
RBLN wrapper for Kandinsky V2.2 Prior pipeline.
Functions¶
from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
This method has two distinct operating modes:
- When
export=True
: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model - When
export=False
: Loads an already compiled RBLN model frommodel_id
without recompilation
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
False
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
RBLNKandinskyV22Pipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22Pipeline
RBLN wrapper for Kandinsky V2.2 text-to-image pipeline.
Functions¶
from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
This method has two distinct operating modes:
- When
export=True
: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model - When
export=False
: Loads an already compiled RBLN model frommodel_id
without recompilation
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
False
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
RBLNKandinskyV22Img2ImgPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22Img2ImgPipeline
RBLN wrapper for Kandinsky V2.2 image-to-image pipeline.
Functions¶
from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
This method has two distinct operating modes:
- When
export=True
: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model - When
export=False
: Loads an already compiled RBLN model frommodel_id
without recompilation
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
False
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
RBLNKandinskyV22InpaintPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22InpaintPipeline
RBLN wrapper for Kandinsky V2.2 inpainting pipeline.
Functions¶
from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
This method has two distinct operating modes:
- When
export=True
: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model - When
export=False
: Loads an already compiled RBLN model frommodel_id
without recompilation
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
False
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
RBLNKandinskyV22CombinedPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22CombinedPipeline
RBLN wrapper for Kandinsky V2.2 Combined (Prior + Text-to-Image Decoder) pipeline.
Functions¶
from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
This method has two distinct operating modes:
- When
export=True
: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model - When
export=False
: Loads an already compiled RBLN model frommodel_id
without recompilation
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
False
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
RBLNKandinskyV22Img2ImgCombinedPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22Img2ImgCombinedPipeline
RBLN wrapper for Kandinsky V2.2 Combined (Prior + Image-to-Image Decoder) pipeline.
Functions¶
from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
This method has two distinct operating modes:
- When
export=True
: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model - When
export=False
: Loads an already compiled RBLN model frommodel_id
without recompilation
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
False
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
RBLNKandinskyV22InpaintCombinedPipeline
¶
Bases: RBLNDiffusionMixin
, KandinskyV22InpaintCombinedPipeline
RBLN wrapper for Kandinsky V2.2 Combined (Prior + Inpainting Decoder) pipeline.
Functions¶
from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
This method has two distinct operating modes:
- When
export=True
: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model - When
export=False
: Loads an already compiled RBLN model frommodel_id
without recompilation
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
False
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Classes¶
RBLNKandinskyV22PipelineBaseConfig
¶
Bases: RBLNModelConfig
Base configuration class for Kandinsky V2.2 decoder pipelines.
Functions¶
__init__(unet=None, movq=None, *, sample_size=None, batch_size=None, guidance_scale=None, image_size=None, img_height=None, img_width=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Configuration for the MoVQ (VQ-GAN) model component. Initialized as RBLNVQModelConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
RBLNKandinskyV22PipelineConfig
¶
Bases: RBLNKandinskyV22PipelineBaseConfig
Configuration class for the Kandinsky V2.2 text-to-image decoder pipeline.
Functions¶
__init__(unet=None, movq=None, *, sample_size=None, batch_size=None, guidance_scale=None, image_size=None, img_height=None, img_width=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Configuration for the MoVQ (VQ-GAN) model component. Initialized as RBLNVQModelConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
RBLNKandinskyV22Img2ImgPipelineConfig
¶
Bases: RBLNKandinskyV22PipelineBaseConfig
Configuration class for the Kandinsky V2.2 image-to-image decoder pipeline.
Functions¶
__init__(unet=None, movq=None, *, sample_size=None, batch_size=None, guidance_scale=None, image_size=None, img_height=None, img_width=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Configuration for the MoVQ (VQ-GAN) model component. Initialized as RBLNVQModelConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
RBLNKandinskyV22InpaintPipelineConfig
¶
Bases: RBLNKandinskyV22PipelineBaseConfig
Configuration class for the Kandinsky V2.2 inpainting decoder pipeline.
Functions¶
__init__(unet=None, movq=None, *, sample_size=None, batch_size=None, guidance_scale=None, image_size=None, img_height=None, img_width=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Configuration for the MoVQ (VQ-GAN) model component. Initialized as RBLNVQModelConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
RBLNKandinskyV22PriorPipelineConfig
¶
Bases: RBLNModelConfig
Configuration class for the Kandinsky V2.2 Prior pipeline.
Functions¶
__init__(text_encoder=None, image_encoder=None, prior=None, *, batch_size=None, guidance_scale=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Configuration for the text encoder component. Initialized as RBLNCLIPTextModelWithProjectionConfig if not provided. |
None
|
image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Configuration for the image encoder component. Initialized as RBLNCLIPVisionModelWithProjectionConfig if not provided. |
None
|
prior
|
Optional[RBLNPriorTransformerConfig]
|
Configuration for the prior transformer component. Initialized as RBLNPriorTransformerConfig if not provided. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Note
When guidance_scale > 1.0, the prior batch size is automatically doubled to accommodate classifier-free guidance.
RBLNKandinskyV22CombinedPipelineBaseConfig
¶
Bases: RBLNModelConfig
Base configuration class for Kandinsky V2.2 combined pipelines.
Functions¶
__init__(prior_pipe=None, decoder_pipe=None, *, sample_size=None, image_size=None, batch_size=None, img_height=None, img_width=None, guidance_scale=None, prior_prior=None, prior_image_encoder=None, prior_text_encoder=None, unet=None, movq=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prior_pipe
|
Optional[RBLNKandinskyV22PriorPipelineConfig]
|
Configuration for the prior pipeline. Initialized as RBLNKandinskyV22PriorPipelineConfig if not provided. |
None
|
decoder_pipe
|
Optional[RBLNKandinskyV22PipelineConfig]
|
Configuration for the decoder pipeline. Initialized as RBLNKandinskyV22PipelineConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
prior_prior
|
Optional[RBLNPriorTransformerConfig]
|
Direct configuration for the prior transformer. Used if prior_pipe is not provided. |
None
|
prior_image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Direct configuration for the image encoder. Used if prior_pipe is not provided. |
None
|
prior_text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Direct configuration for the text encoder. Used if prior_pipe is not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Direct configuration for the UNet. Used if decoder_pipe is not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Direct configuration for the MoVQ (VQ-GAN) model. Used if decoder_pipe is not provided. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
RBLNKandinskyV22CombinedPipelineConfig
¶
Bases: RBLNKandinskyV22CombinedPipelineBaseConfig
Configuration class for the Kandinsky V2.2 combined text-to-image pipeline.
Functions¶
__init__(prior_pipe=None, decoder_pipe=None, *, sample_size=None, image_size=None, batch_size=None, img_height=None, img_width=None, guidance_scale=None, prior_prior=None, prior_image_encoder=None, prior_text_encoder=None, unet=None, movq=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prior_pipe
|
Optional[RBLNKandinskyV22PriorPipelineConfig]
|
Configuration for the prior pipeline. Initialized as RBLNKandinskyV22PriorPipelineConfig if not provided. |
None
|
decoder_pipe
|
Optional[RBLNKandinskyV22PipelineConfig]
|
Configuration for the decoder pipeline. Initialized as RBLNKandinskyV22PipelineConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
prior_prior
|
Optional[RBLNPriorTransformerConfig]
|
Direct configuration for the prior transformer. Used if prior_pipe is not provided. |
None
|
prior_image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Direct configuration for the image encoder. Used if prior_pipe is not provided. |
None
|
prior_text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Direct configuration for the text encoder. Used if prior_pipe is not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Direct configuration for the UNet. Used if decoder_pipe is not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Direct configuration for the MoVQ (VQ-GAN) model. Used if decoder_pipe is not provided. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
RBLNKandinskyV22InpaintCombinedPipelineConfig
¶
Bases: RBLNKandinskyV22CombinedPipelineBaseConfig
Configuration class for the Kandinsky V2.2 combined inpainting pipeline.
Functions¶
__init__(prior_pipe=None, decoder_pipe=None, *, sample_size=None, image_size=None, batch_size=None, img_height=None, img_width=None, guidance_scale=None, prior_prior=None, prior_image_encoder=None, prior_text_encoder=None, unet=None, movq=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prior_pipe
|
Optional[RBLNKandinskyV22PriorPipelineConfig]
|
Configuration for the prior pipeline. Initialized as RBLNKandinskyV22PriorPipelineConfig if not provided. |
None
|
decoder_pipe
|
Optional[RBLNKandinskyV22PipelineConfig]
|
Configuration for the decoder pipeline. Initialized as RBLNKandinskyV22PipelineConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
prior_prior
|
Optional[RBLNPriorTransformerConfig]
|
Direct configuration for the prior transformer. Used if prior_pipe is not provided. |
None
|
prior_image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Direct configuration for the image encoder. Used if prior_pipe is not provided. |
None
|
prior_text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Direct configuration for the text encoder. Used if prior_pipe is not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Direct configuration for the UNet. Used if decoder_pipe is not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Direct configuration for the MoVQ (VQ-GAN) model. Used if decoder_pipe is not provided. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
RBLNKandinskyV22Img2ImgCombinedPipelineConfig
¶
Bases: RBLNKandinskyV22CombinedPipelineBaseConfig
Configuration class for the Kandinsky V2.2 combined image-to-image pipeline.
Functions¶
__init__(prior_pipe=None, decoder_pipe=None, *, sample_size=None, image_size=None, batch_size=None, img_height=None, img_width=None, guidance_scale=None, prior_prior=None, prior_image_encoder=None, prior_text_encoder=None, unet=None, movq=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prior_pipe
|
Optional[RBLNKandinskyV22PriorPipelineConfig]
|
Configuration for the prior pipeline. Initialized as RBLNKandinskyV22PriorPipelineConfig if not provided. |
None
|
decoder_pipe
|
Optional[RBLNKandinskyV22PipelineConfig]
|
Configuration for the decoder pipeline. Initialized as RBLNKandinskyV22PipelineConfig if not provided. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Dimensions for the generated images. Cannot be used together with img_height/img_width. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
prior_prior
|
Optional[RBLNPriorTransformerConfig]
|
Direct configuration for the prior transformer. Used if prior_pipe is not provided. |
None
|
prior_image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Direct configuration for the image encoder. Used if prior_pipe is not provided. |
None
|
prior_text_encoder
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Direct configuration for the text encoder. Used if prior_pipe is not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Direct configuration for the UNet. Used if decoder_pipe is not provided. |
None
|
movq
|
Optional[RBLNVQModelConfig]
|
Direct configuration for the MoVQ (VQ-GAN) model. Used if decoder_pipe is not provided. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|