Stable Video Diffusion¶
Stable Video Diffusion은 입력 이미지로부터 비디오를 생성할 수 있는 이미지-비디오 잠재 확산 모델입니다. RBLN NPU는 Optimum RBLN을 사용하여 Stable Video Diffusion 파이프라인을 가속화할 수 있습니다.
지원하는 파이프라인¶
Optimum RBLN은 아래 Stable Video Diffusion 파이프라인을 지원합니다:
- 이미지-비디오 변환(Image-to-Video): 입력 이미지에서 비디오 생성
중요: Guidance Scale에 따른 배치 크기 설정¶
배치 크기와 Guidance Scale
Stable Video Diffusion을 max guidance scale > 1.0으로 사용할 때(기본값은 3.0), classifier-free guidance 기법으로 인해 UNet의 실제 배치 크기가 실행 시 2배가 됩니다.
RBLN NPU는 정적 그래프 컴파일을 사용하므로, 컴파일 시 UNet의 배치 크기가 실행 시 배치 크기와 일치해야 합니다. 그렇지 않으면 추론 중에 오류가 발생합니다.
기본 동작¶
UNet의 배치 크기를 명시적으로 지정하지 않는 경우, Optimum RBLN은 다음과 같이 동작합니다:
- 기본 max guidance scale(3.0)을 사용한다고 가정합니다
- 자동으로 UNet의 배치 크기를 파이프라인 배치 크기의 2배로 설정합니다
기본 max guidance scale(1.0보다 큰 값)을 사용할 계획이라면, 이 구성이 자동으로 올바르게 작동합니다. 그러나 다른 guidance scale을 사용하거나 더 많은 제어가 필요한 경우에는 UNet의 배치 크기를 명시적으로 구성해야 합니다.
예시: UNet 배치 크기 명시적 설정¶
예시: Max Guidance Scale 1.0 사용¶
사용 예제¶
API 참조¶
Classes¶
RBLNStableVideoDiffusionPipeline
¶
Bases: RBLNDiffusionMixin, StableVideoDiffusionPipeline
RBLN-accelerated implementation of Stable Video Diffusion pipeline for image-to-video generation.
This pipeline compiles Stable Video Diffusion models to run efficiently on RBLN NPUs, enabling high-performance inference for generating videos from images with optimized memory usage and throughput.
Functions¶
from_pretrained(model_id, *, export=None, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_id
|
`str`
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
None
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
kwargs
|
Any
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
| Type | Description |
|---|---|
RBLNDiffusionMixin
|
A compiled or loaded diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Functions¶
Classes¶
RBLNStableVideoDiffusionPipelineConfig
¶
Bases: RBLNModelConfig
Functions¶
__init__(image_encoder=None, unet=None, vae=None, *, batch_size=None, height=None, width=None, num_frames=None, decode_chunk_size=None, guidance_scale=None, **kwargs)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_encoder
|
Optional[RBLNCLIPVisionModelWithProjectionConfig]
|
Configuration for the image encoder component. Initialized as RBLNCLIPVisionModelWithProjectionConfig if not provided. |
None
|
unet
|
Optional[RBLNUNetSpatioTemporalConditionModelConfig]
|
Configuration for the UNet model component. Initialized as RBLNUNetSpatioTemporalConditionModelConfig if not provided. |
None
|
vae
|
Optional[RBLNAutoencoderKLTemporalDecoderConfig]
|
Configuration for the VAE model component. Initialized as RBLNAutoencoderKLTemporalDecoderConfig if not provided. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
height
|
Optional[int]
|
Height of the generated images. |
None
|
width
|
Optional[int]
|
Width of the generated images. |
None
|
num_frames
|
Optional[int]
|
The number of frames in the generated video. |
None
|
decode_chunk_size
|
Optional[int]
|
The number of frames to decode at once during VAE decoding. Useful for managing memory usage during video generation. |
None
|
guidance_scale
|
Optional[float]
|
Scale for classifier-free guidance. |
None
|
kwargs
|
Any
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If both image_size and height/width are provided. |
Note
When guidance_scale > 1.0, the UNet batch size is automatically doubled to accommodate classifier-free guidance.
load(path, **kwargs)
classmethod
¶
Load a RBLNModelConfig from a path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the RBLNModelConfig file or directory containing the config file. |
required |
kwargs
|
Any
|
Additional keyword arguments to override configuration values. Keys starting with 'rbln_' will have the prefix removed and be used to update the configuration. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
RBLNModelConfig |
RBLNModelConfig
|
The loaded configuration instance. |
Note
This method loads the configuration from the specified path and applies any provided overrides. If the loaded configuration class doesn't match the expected class, a warning will be logged.