Stable Diffusion XL ControlNet¶
ControlNet은 더 진보된 Stable Diffusion XL (SDXL) 모델에도 적용될 수 있어, 조건 이미지로부터 정밀한 구조적 가이드라인을 받아 고해상도 이미지 생성을 가능하게 합니다. Optimum RBLN은 RBLN NPU를 위한 가속화된 SDXL ControlNet 파이프라인을 제공합니다.
지원하는 파이프라인¶
- SDXL ControlNet을 사용한 텍스트-이미지 변환: SDXL 기반 모델을 사용하여 제어 이미지로 가이드되는 텍스트 프롬프트에서 고해상도 이미지 생성.
- SDXL ControlNet을 사용한 이미지-이미지 변환: SDXL 기반 모델을 사용하여 텍스트 프롬프트와 제어 이미지를 기반으로 기존 이미지 수정.
중요: Guidance Scale에 따른 배치 크기 설정¶
배치 크기와 Guidance Scale (SDXL)
다른 SDXL 파이프라인과 마찬가지로, guidance_scale > 1.0으로 ControlNet SDXL 파이프라인을 사용하면 UNet 및 ControlNet 모델의 실제 배치 크기가 2배가 됩니다.
RBLNStableDiffusionXLControlNetPipelineConfig의 unet 및 controlnet 섹션에 지정된 batch_size가 예상 실행 시간 배치 크기(guidance_scale > 1.0인 경우 일반적으로 추론 배치 크기의 2배)와 일치하는지 확인하십시오. 이를 생략하면 파이프라인의 batch_size를 기준으로 자동으로 2배가 됩니다.
API 참조¶
Classes¶
RBLNStableDiffusionXLControlNetPipeline
¶
Bases: RBLNDiffusionMixin, StableDiffusionXLControlNetPipeline
RBLN-accelerated implementation of Stable Diffusion XL pipeline with ControlNet for high-resolution guided text-to-image generation.
This pipeline compiles Stable Diffusion XL and ControlNet models to run efficiently on RBLN NPUs, enabling high-performance inference for generating high-quality images with precise structural control and enhanced detail preservation.
Functions¶
__call__(prompt=None, prompt_2=None, image=None, height=None, width=None, num_inference_steps=50, denoising_end=None, guidance_scale=5.0, negative_prompt=None, negative_prompt_2=None, num_images_per_prompt=1, eta=0.0, generator=None, latents=None, prompt_embeds=None, negative_prompt_embeds=None, pooled_prompt_embeds=None, negative_pooled_prompt_embeds=None, ip_adapter_image=None, ip_adapter_image_embeds=None, output_type='pil', return_dict=True, cross_attention_kwargs=None, controlnet_conditioning_scale=1.0, guess_mode=False, control_guidance_start=0.0, control_guidance_end=1.0, original_size=None, crops_coords_top_left=(0, 0), target_size=None, negative_original_size=None, negative_crops_coords_top_left=(0, 0), negative_target_size=None, clip_skip=None, callback_on_step_end=None, callback_on_step_end_tensor_inputs=['latents'], **kwargs)
¶
The call function to the pipeline for generation.
Parameters:
| Name | Type | Description | Default | ||
|---|---|---|---|---|---|
prompt
|
`str` or `List[str]`, *optional*
|
The prompt or prompts to guide image generation. If not defined, you need to pass |
None
|
||
prompt_2
|
`str` or `List[str]`, *optional*
|
The prompt or prompts to be sent to |
None
|
||
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, `List[np.ndarray]`,
|
The ControlNet input condition to provide guidance to the |
required | |||
height
|
`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`
|
The height in pixels of the generated image. Anything below 512 pixels won't work well for stabilityai/stable-diffusion-xl-base-1.0 and checkpoints that are not specifically fine-tuned on low resolutions. |
None
|
||
width
|
`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`
|
The width in pixels of the generated image. Anything below 512 pixels won't work well for stabilityai/stable-diffusion-xl-base-1.0 and checkpoints that are not specifically fine-tuned on low resolutions. |
None
|
||
num_inference_steps
|
`int`, *optional*, defaults to 50
|
The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
50
|
||
denoising_end
|
`float`, *optional*
|
When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be completed before it is intentionally prematurely terminated. As a result, the returned sample will still retain a substantial amount of noise as determined by the discrete timesteps selected by the scheduler. The denoising_end parameter should ideally be utilized when this pipeline forms a part of a "Mixture of Denoisers" multi-pipeline setup, as elaborated in Refining the Image Output |
None
|
||
guidance_scale
|
`float`, *optional*, defaults to 5.0
|
A higher guidance scale value encourages the model to generate images closely linked to the text
|
5.0
|
||
negative_prompt
|
`str` or `List[str]`, *optional*
|
The prompt or prompts to guide what to not include in image generation. If not defined, you need to
pass |
None
|
||
negative_prompt_2
|
`str` or `List[str]`, *optional*
|
The prompt or prompts to guide what to not include in image generation. This is sent to |
None
|
||
num_images_per_prompt
|
`int`, *optional*, defaults to 1
|
The number of images to generate per prompt. |
1
|
||
eta
|
`float`, *optional*, defaults to 0.0
|
Corresponds to parameter eta (η) from the DDIM paper. Only applies
to the [ |
0.0
|
||
generator
|
`torch.Generator` or `List[torch.Generator]`, *optional*
|
A |
None
|
||
latents
|
`torch.FloatTensor`, *optional*
|
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor is generated by sampling using the supplied random |
None
|
||
prompt_embeds
|
`torch.FloatTensor`, *optional*
|
Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
provided, text embeddings are generated from the |
None
|
||
negative_prompt_embeds
|
`torch.FloatTensor`, *optional*
|
Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
not provided, |
None
|
||
pooled_prompt_embeds
|
`torch.FloatTensor`, *optional*
|
Pre-generated pooled text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
not provided, pooled text embeddings are generated from |
None
|
||
negative_pooled_prompt_embeds
|
`torch.FloatTensor`, *optional*
|
Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs (prompt
weighting). If not provided, pooled |
None
|
||
ip_adapter_image
|
Optional[PipelineImageInput]
|
( |
None
|
||
ip_adapter_image_embeds
|
`List[torch.FloatTensor]`, *optional*
|
Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
Each element should be a tensor of shape |
None
|
||
output_type
|
`str`, *optional*, defaults to `"pil"`
|
The output format of the generated image. Choose between |
'pil'
|
||
return_dict
|
`bool`, *optional*, defaults to `True`
|
Whether or not to return a [ |
True
|
||
cross_attention_kwargs
|
`dict`, *optional*
|
A kwargs dictionary that if specified is passed along to the [ |
None
|
||
controlnet_conditioning_scale
|
`float` or `List[float]`, *optional*, defaults to 1.0
|
The outputs of the ControlNet are multiplied by |
1.0
|
||
guess_mode
|
`bool`, *optional*, defaults to `False`
|
The ControlNet encoder tries to recognize the content of the input image even if you remove all
prompts. A |
False
|
||
control_guidance_start
|
`float` or `List[float]`, *optional*, defaults to 0.0
|
The percentage of total steps at which the ControlNet starts applying. |
0.0
|
||
control_guidance_end
|
`float` or `List[float]`, *optional*, defaults to 1.0
|
The percentage of total steps at which the ControlNet stops applying. |
1.0
|
||
original_size
|
`Tuple[int]`, *optional*, defaults to (1024, 1024)
|
If |
None
|
||
crops_coords_top_left
|
`Tuple[int]`, *optional*, defaults to (0, 0)
|
|
(0, 0)
|
||
target_size
|
`Tuple[int]`, *optional*, defaults to (1024, 1024)
|
For most cases, |
None
|
||
negative_original_size
|
`Tuple[int]`, *optional*, defaults to (1024, 1024)
|
To negatively condition the generation process based on a specific image resolution. Part of SDXL's micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. |
None
|
||
negative_crops_coords_top_left
|
`Tuple[int]`, *optional*, defaults to (0, 0)
|
To negatively condition the generation process based on a specific crop coordinates. Part of SDXL's micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. |
(0, 0)
|
||
negative_target_size
|
`Tuple[int]`, *optional*, defaults to (1024, 1024)
|
To negatively condition the generation process based on a target image resolution. It should be as same
as the |
None
|
||
clip_skip
|
`int`, *optional*
|
Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings. |
None
|
||
callback_on_step_end
|
`Callable`, *optional*
|
A function that calls at the end of each denoising steps during the inference. The function is called
with the following arguments: |
None
|
||
callback_on_step_end_tensor_inputs
|
`List`, *optional*
|
The list of tensor inputs for the |
['latents']
|
Examples:
Returns:
| Type | Description |
|---|---|
|
[ |
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
| Type | Description |
|---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Functions¶
Classes¶
RBLNStableDiffusionXLControlNetImg2ImgPipeline
¶
Bases: RBLNDiffusionMixin, StableDiffusionXLControlNetImg2ImgPipeline
RBLN-accelerated implementation of Stable Diffusion XL pipeline with ControlNet for high-resolution guided image-to-image generation.
This pipeline compiles Stable Diffusion XL and ControlNet models to run efficiently on RBLN NPUs, enabling high-performance inference for transforming input images with precise structural control and enhanced quality preservation.
Functions¶
__call__(prompt=None, prompt_2=None, image=None, control_image=None, height=None, width=None, strength=0.8, num_inference_steps=50, guidance_scale=5.0, negative_prompt=None, negative_prompt_2=None, num_images_per_prompt=1, eta=0.0, generator=None, latents=None, prompt_embeds=None, negative_prompt_embeds=None, pooled_prompt_embeds=None, negative_pooled_prompt_embeds=None, ip_adapter_image=None, ip_adapter_image_embeds=None, output_type='pil', return_dict=True, cross_attention_kwargs=None, controlnet_conditioning_scale=0.8, guess_mode=False, control_guidance_start=0.0, control_guidance_end=1.0, original_size=None, crops_coords_top_left=(0, 0), target_size=None, negative_original_size=None, negative_crops_coords_top_left=(0, 0), negative_target_size=None, aesthetic_score=6.0, negative_aesthetic_score=2.5, clip_skip=None, callback_on_step_end=None, callback_on_step_end_tensor_inputs=['latents'], **kwargs)
¶
Function invoked when calling the pipeline for generation.
Parameters:
| Name | Type | Description | Default | ||
|---|---|---|---|---|---|
prompt
|
`str` or `List[str]`, *optional*
|
The prompt or prompts to guide the image generation. If not defined, one has to pass |
None
|
||
prompt_2
|
`str` or `List[str]`, *optional*
|
The prompt or prompts to be sent to the |
None
|
||
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, `List[np.ndarray]`,
|
The initial image will be used as the starting point for the image generation process. Can also accept
image latents as |
required | |||
control_image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, `List[np.ndarray]`,
|
The ControlNet input condition. ControlNet uses this input condition to generate guidance to Unet. If
the type is specified as |
required | |||
height
|
`int`, *optional*, defaults to the size of control_image
|
The height in pixels of the generated image. Anything below 512 pixels won't work well for stabilityai/stable-diffusion-xl-base-1.0 and checkpoints that are not specifically fine-tuned on low resolutions. |
None
|
||
width
|
`int`, *optional*, defaults to the size of control_image
|
The width in pixels of the generated image. Anything below 512 pixels won't work well for stabilityai/stable-diffusion-xl-base-1.0 and checkpoints that are not specifically fine-tuned on low resolutions. |
None
|
||
strength
|
`float`, *optional*, defaults to 0.8
|
Indicates extent to transform the reference |
0.8
|
||
num_inference_steps
|
`int`, *optional*, defaults to 50
|
The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. |
50
|
||
guidance_scale
|
`float`, *optional*, defaults to 7.5
|
Guidance scale as defined in Classifier-Free Diffusion Guidance.
|
5.0
|
||
negative_prompt
|
`str` or `List[str]`, *optional*
|
The prompt or prompts not to guide the image generation. If not defined, one has to pass
|
None
|
||
negative_prompt_2
|
`str` or `List[str]`, *optional*
|
The prompt or prompts not to guide the image generation to be sent to |
None
|
||
num_images_per_prompt
|
`int`, *optional*, defaults to 1
|
The number of images to generate per prompt. |
1
|
||
eta
|
`float`, *optional*, defaults to 0.0
|
Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
[ |
0.0
|
||
generator
|
`torch.Generator` or `List[torch.Generator]`, *optional*
|
One or a list of torch generator(s) to make generation deterministic. |
None
|
||
latents
|
`torch.FloatTensor`, *optional*
|
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random |
None
|
||
prompt_embeds
|
`torch.FloatTensor`, *optional*
|
Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not
provided, text embeddings will be generated from |
None
|
||
negative_prompt_embeds
|
`torch.FloatTensor`, *optional*
|
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt
weighting. If not provided, negative_prompt_embeds will be generated from |
None
|
||
pooled_prompt_embeds
|
`torch.FloatTensor`, *optional*
|
Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting.
If not provided, pooled text embeddings will be generated from |
None
|
||
negative_pooled_prompt_embeds
|
`torch.FloatTensor`, *optional*
|
Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, e.g. prompt
weighting. If not provided, pooled negative_prompt_embeds will be generated from |
None
|
||
ip_adapter_image
|
Optional[PipelineImageInput]
|
( |
None
|
||
ip_adapter_image_embeds
|
`List[torch.FloatTensor]`, *optional*
|
Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
Each element should be a tensor of shape |
None
|
||
output_type
|
`str`, *optional*, defaults to `"pil"`
|
The output format of the generate image. Choose between
PIL: |
'pil'
|
||
return_dict
|
`bool`, *optional*, defaults to `True`
|
Whether or not to return a [ |
True
|
||
cross_attention_kwargs
|
`dict`, *optional*
|
A kwargs dictionary that if specified is passed along to the |
None
|
||
controlnet_conditioning_scale
|
`float` or `List[float]`, *optional*, defaults to 1.0
|
The outputs of the controlnet are multiplied by |
0.8
|
||
guess_mode
|
`bool`, *optional*, defaults to `False`
|
In this mode, the ControlNet encoder will try best to recognize the content of the input image even if
you remove all prompts. The |
False
|
||
control_guidance_start
|
`float` or `List[float]`, *optional*, defaults to 0.0
|
The percentage of total steps at which the controlnet starts applying. |
0.0
|
||
control_guidance_end
|
`float` or `List[float]`, *optional*, defaults to 1.0
|
The percentage of total steps at which the controlnet stops applying. |
1.0
|
||
original_size
|
`Tuple[int]`, *optional*, defaults to (1024, 1024)
|
If |
None
|
||
crops_coords_top_left
|
`Tuple[int]`, *optional*, defaults to (0, 0)
|
|
(0, 0)
|
||
target_size
|
`Tuple[int]`, *optional*, defaults to (1024, 1024)
|
For most cases, |
None
|
||
negative_original_size
|
`Tuple[int]`, *optional*, defaults to (1024, 1024)
|
To negatively condition the generation process based on a specific image resolution. Part of SDXL's micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. |
None
|
||
negative_crops_coords_top_left
|
`Tuple[int]`, *optional*, defaults to (0, 0)
|
To negatively condition the generation process based on a specific crop coordinates. Part of SDXL's micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. |
(0, 0)
|
||
negative_target_size
|
`Tuple[int]`, *optional*, defaults to (1024, 1024)
|
To negatively condition the generation process based on a target image resolution. It should be as same
as the |
None
|
||
aesthetic_score
|
`float`, *optional*, defaults to 6.0
|
Used to simulate an aesthetic score of the generated image by influencing the positive text condition. Part of SDXL's micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. |
6.0
|
||
negative_aesthetic_score
|
`float`, *optional*, defaults to 2.5
|
Part of SDXL's micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. Can be used to simulate an aesthetic score of the generated image by influencing the negative text condition. |
2.5
|
||
clip_skip
|
`int`, *optional*
|
Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings. |
None
|
||
callback_on_step_end
|
`Callable`, *optional*
|
A function that calls at the end of each denoising steps during the inference. The function is called
with the following arguments: |
None
|
||
callback_on_step_end_tensor_inputs
|
`List`, *optional*
|
The list of tensor inputs for the |
['latents']
|
Examples:
Returns:
| Type | Description |
|---|---|
|
[ |
|
|
[ |
|
|
containing the output images. |
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
| Type | Description |
|---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Functions¶
Classes¶
RBLNStableDiffusionXLControlNetPipelineBaseConfig
¶
Bases: RBLNModelConfig
Base configuration for Stable Diffusion XL ControlNet pipelines.
Functions¶
__init__(text_encoder=None, text_encoder_2=None, unet=None, vae=None, controlnet=None, *, batch_size=None, img_height=None, img_width=None, height=None, width=None, sample_size=None, image_size=None, guidance_scale=None, **kwargs)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text_encoder
|
Optional[RBLNCLIPTextModelConfig]
|
Configuration for the primary text encoder. Initialized as RBLNCLIPTextModelConfig if not provided. |
None
|
text_encoder_2
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Configuration for the secondary text encoder. Initialized as RBLNCLIPTextModelWithProjectionConfig if not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
vae
|
Optional[RBLNAutoencoderKLConfig]
|
Configuration for the VAE model component. Initialized as RBLNAutoencoderKLConfig if not provided. |
None
|
controlnet
|
Optional[RBLNControlNetModelConfig]
|
Configuration for the ControlNet model component. Initialized as RBLNControlNetModelConfig if not provided. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Alternative way to specify image dimensions. Cannot be used together with img_height/img_width. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNStableDiffusionXLControlNetPipelineConfig
¶
Bases: RBLNStableDiffusionXLControlNetPipelineBaseConfig
Configuration for Stable Diffusion XL ControlNet pipeline.
Functions¶
__init__(text_encoder=None, text_encoder_2=None, unet=None, vae=None, controlnet=None, *, batch_size=None, img_height=None, img_width=None, height=None, width=None, sample_size=None, image_size=None, guidance_scale=None, **kwargs)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text_encoder
|
Optional[RBLNCLIPTextModelConfig]
|
Configuration for the primary text encoder. Initialized as RBLNCLIPTextModelConfig if not provided. |
None
|
text_encoder_2
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Configuration for the secondary text encoder. Initialized as RBLNCLIPTextModelWithProjectionConfig if not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
vae
|
Optional[RBLNAutoencoderKLConfig]
|
Configuration for the VAE model component. Initialized as RBLNAutoencoderKLConfig if not provided. |
None
|
controlnet
|
Optional[RBLNControlNetModelConfig]
|
Configuration for the ControlNet model component. Initialized as RBLNControlNetModelConfig if not provided. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Alternative way to specify image dimensions. Cannot be used together with img_height/img_width. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.
RBLNStableDiffusionXLControlNetImg2ImgPipelineConfig
¶
Bases: RBLNStableDiffusionXLControlNetPipelineBaseConfig
Configuration for Stable Diffusion XL ControlNet image-to-image pipeline.
Functions¶
__init__(text_encoder=None, text_encoder_2=None, unet=None, vae=None, controlnet=None, *, batch_size=None, img_height=None, img_width=None, height=None, width=None, sample_size=None, image_size=None, guidance_scale=None, **kwargs)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text_encoder
|
Optional[RBLNCLIPTextModelConfig]
|
Configuration for the primary text encoder. Initialized as RBLNCLIPTextModelConfig if not provided. |
None
|
text_encoder_2
|
Optional[RBLNCLIPTextModelWithProjectionConfig]
|
Configuration for the secondary text encoder. Initialized as RBLNCLIPTextModelWithProjectionConfig if not provided. |
None
|
unet
|
Optional[RBLNUNet2DConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNet2DConditionModelConfig if not provided. |
None
|
vae
|
Optional[RBLNAutoencoderKLConfig]
|
Configuration for the VAE model component. Initialized as RBLNAutoencoderKLConfig if not provided. |
None
|
controlnet
|
Optional[RBLNControlNetModelConfig]
|
Configuration for the ControlNet model component. Initialized as RBLNControlNetModelConfig if not provided. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
img_height
|
Optional[int]
|
Height of the generated images. |
None
|
img_width
|
Optional[int]
|
Width of the generated images. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
sample_size
|
Optional[Tuple[int, int]]
|
Spatial dimensions for the UNet model. |
None
|
image_size
|
Optional[Tuple[int, int]]
|
Alternative way to specify image dimensions. Cannot be used together with img_height/img_width. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If both image_size and img_height/img_width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.