Cosmos¶

Cosmos World Foundation Models는 물리 법칙을 정확하게 반영하는 영상과 가상 세계의 상태를 생성하는 데 특화되어 있습니다. 디퓨전 기술을 기반으로 텍스트, 이미지 등 다양한 입력을 활용해 역동적이고 품질 높은 영상을 만들어냅니다. 따라서 가상 세계 생성과 관련된 여러 연구 및 응용 분야의 핵심적인 기반 기술로 사용됩니다. RBLN NPU는 Optimum RBLN을 사용하여 Cosmos 파이프라인을 가속화할 수 있습니다.

지원하는 파이프라인¶

Optimum RBLN은 여러 Cosmos 파이프라인을 지원합니다:

텍스트-비디오 변환(Text-to-Video): 텍스트 프롬프트에서 고품질 비디오 생성.
비디오-비디오 변환(Video-to-Video): 비디오와 텍스트 프롬프트를 기반으로 고품질 비디오 생성.

주요 클래스¶

RBLNCosmosTextToWorldPipeline: Cosmos의 텍스트-비디오 파이프라인.
RBLNCosmosTextToWorldPipelineConfig: 텍스트-비디오 파이프라인 설정.
RBLNCosmosVideoToWorldPipeline: Cosmos의 비디오-비디오 파이프라인.
RBLNCosmosVideoToWorldPipelineConfig: 비디오-비디오 파이프라인 설정.

기본 동작¶

Cosmos 파이프라인에는 가드레일 모델(RBLNSafetyChecker)이 존재합니다. 이 가드레일 모델은 Cosmos 파이프라인에서 아래 세 가지 역할을 수행합니다:

사용자의 입력 텍스트 프롬프트에서 부적절한 표현이 존재하는지 검사합니다.
Cosmos 파이프라인을 통해 생성한 비디오에 부적절한 내용이 포함되어 있는지 검사합니다.
Cosmos 파이프라인을 통해 생성한 비디오에 안면에 해당하는 부분이 존재한다면 해당 픽셀에 모자이크를 적용합니다.

중요: Cosmos 가드레일 모델¶

NVIDIA Open Model License

NVIDIA Open Model License 정책에 따라, 임의로 귀하가 Cosmos의 가드레일 모델 기능을 우회, 비활성화, 약화시키거나 다른 방식으로 회피하는 경우 귀하의 권리는 자동으로 종료됩니다.

사용 예제 (텍스트-비디오)¶

from diffusers.utils import export_to_video
from optimum.rbln import RBLNCosmosTextToWorldPipeline, RBLNCosmosTextToWorldPipelineConfig

# 구성 객체 생성 (선택 사항, 기본값 사용 가능)
config = RBLNCosmosTextToWorldPipelineConfig(
    height=704,
    width=1280,
    transformer={
        "tensor_parallel_size": 4,
        "device": [0, 1, 2, 3],
    },
    text_encoder={
        "device": 2,
    },
    vae={
        "device": 3,
    },
    safety_checker={
        "aegis": {
            "tensor_parallel_size": 4,
            "device": [4, 5, 6, 7],
            },
        "siglip_encoder": {"device": 4},
        "video_safety_model": {"device": 4},
        "face_blur_filter": {"device": 4},
    }
)

# RBLN NPU용 Cosmos 모델 로드 및 컴파일
pipe = RBLNCosmosTextToWorldPipeline.from_pretrained(
    "nvidia/Cosmos-1.0-Diffusion-7B-Text2World",
    export=True,
    rbln_config=config,
)

# 비디오 생성
prompt = "A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
output = pipe(prompt=prompt).frames[0]

# 생성된 비디오 저장
export_to_video(output, "output.mp4", fps=30)
print("비디오가 output.mp4로 저장되었습니다")

사용 예제 (비디오-비디오)¶

from diffusers.utils import export_to_video, load_video
from optimum.rbln import RBLNCosmosVideoToWorldPipeline, RBLNCosmosVideoToWorldPipelineConfig

# 구성 객체 생성 (선택 사항, 기본값 사용 가능)
config = RBLNCosmosVideoToWorldPipelineConfig(
    height=704,
    width=1280,
    transformer={
        "tensor_parallel_size": 4,
        "device": [0, 1, 2, 3],
    },
    text_encoder={
        "device": 4,
    },
    vae={
        "device_map": {"encoder": 5, "decoder": 6},
    },
    safety_checker={
        "aegis": {
            "tensor_parallel_size": 4,
            "device": [4, 5, 6, 7]
            },
        "siglip_encoder": {"device": 7},
        "video_safety_model": {"device": 7},
        "face_blur_filter": {"device": 7},
    }
)

# RBLN NPU용 Cosmos 모델 로드 및 컴파일
pipe = RBLNCosmosVideoToWorldPipeline.from_pretrained(
    "nvidia/Cosmos-1.0-Diffusion-7B-Video2World",
    export=True,
    rbln_config=config,
)

# 비디오 생성
video = load_video("https://github.com/nvidia-cosmos/cosmos-predict1/raw/refs/heads/main/assets/diffusion/video2world_input1.mp4")
prompt = "A dynamic and visually captivating video showcases a sleek, dark-colored SUV driving along a narrow dirt road that runs parallel to a vast, expansive ocean. The setting is a rugged coastal landscape, with the road cutting through dry, golden-brown grass that stretches across rolling hills. The ocean, a deep blue, extends to the horizon, providing a stunning backdrop to the scene. The SUV moves swiftly along the road, kicking up a trail of dust that lingers in the air behind it, emphasizing the speed and power of the vehicle. The camera maintains a steady tracking shot, following the SUV from a slightly elevated angle, which allows for a clear view of both the vehicle and the surrounding scenery. The lighting is natural, suggesting a time of day when the sun is high, casting minimal shadows and highlighting the textures of the grass and the glint of the ocean. The video captures the essence of freedom and adventure, with the SUV navigating the isolated road with ease, suggesting a journey or exploration theme. The consistent motion of the vehicle and the dust trail create a sense of continuity and fluidity throughout the video, making it engaging and immersive."
output = pipe(video=video, prompt=prompt).frames[0]

# 생성된 비디오 저장
export_to_video(output, "output.mp4", fps=30)
print("비디오가 output.mp4로 저장되었습니다")

API 참조¶

Classes¶

`RBLNCosmosTextToWorldPipeline` ¶

Bases: RBLNDiffusionMixin, CosmosTextToWorldPipeline

RBLN-accelerated implementation of Cosmos Text to World pipeline for text-to-video generation.

This pipeline compiles Cosmos Text to World models to run efficiently on RBLN NPUs, enabling high-performance inference for generating videos with distinctive artistic style and enhanced visual quality.

Functions¶

`from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)` `classmethod` ¶

Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.

This method has two distinct operating modes:

When export=True: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model
When export=False: Loads an already compiled RBLN model from model_id without recompilation

It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The model ID or path to the pretrained model to load. Can be either: A model ID from the HuggingFace Hub A local path to a saved model directory	required
`export`	`bool`	If True, takes a PyTorch model from `model_id` and compiles it for RBLN NPU execution. If False, loads an already compiled RBLN model from `model_id` without recompilation.	`False`
`model_save_dir`	`Optional[PathLike]`	Directory to save the compiled model artifacts. Only used when `export=True`. If not provided and `export=True`, a temporary directory is used.	`None`
`rbln_config`	`Dict[str, Any]`	Configuration options for RBLN compilation. Can include settings for specific submodules such as `text_encoder`, `unet`, and `vae`. Configuration can be tailored to the specific pipeline being compiled.	`{}`
`lora_ids`	`Optional[Union[str, List[str]]]`	LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused into the model weights during compilation. Only used when `export=True`.	`None`
`lora_weights_names`	`Optional[Union[str, List[str]]]`	Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when `export=True`.	`None`
`lora_scales`	`Optional[Union[float, List[float]]]`	Scaling factor(s) to apply to the LoRA adapter(s). Only used when `export=True`.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used.	`{}`

Returns:

Type	Description
`Self`	A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin.

Classes¶

`RBLNCosmosVideoToWorldPipeline` ¶

Bases: RBLNDiffusionMixin, CosmosVideoToWorldPipeline

RBLN-accelerated implementation of Cosmos Video to World pipeline for video-to-video generation.

This pipeline compiles Cosmos Video to World models to run efficiently on RBLN NPUs, enabling high-performance inference for generating videos with distinctive artistic style and enhanced visual quality.

Functions¶

`from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)` `classmethod` ¶

Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.

This method has two distinct operating modes:

When export=True: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model
When export=False: Loads an already compiled RBLN model from model_id without recompilation

It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The model ID or path to the pretrained model to load. Can be either: A model ID from the HuggingFace Hub A local path to a saved model directory	required
`export`	`bool`	If True, takes a PyTorch model from `model_id` and compiles it for RBLN NPU execution. If False, loads an already compiled RBLN model from `model_id` without recompilation.	`False`
`model_save_dir`	`Optional[PathLike]`	Directory to save the compiled model artifacts. Only used when `export=True`. If not provided and `export=True`, a temporary directory is used.	`None`
`rbln_config`	`Dict[str, Any]`	Configuration options for RBLN compilation. Can include settings for specific submodules such as `text_encoder`, `unet`, and `vae`. Configuration can be tailored to the specific pipeline being compiled.	`{}`
`lora_ids`	`Optional[Union[str, List[str]]]`	LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused into the model weights during compilation. Only used when `export=True`.	`None`
`lora_weights_names`	`Optional[Union[str, List[str]]]`	Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when `export=True`.	`None`
`lora_scales`	`Optional[Union[float, List[float]]]`	Scaling factor(s) to apply to the LoRA adapter(s). Only used when `export=True`.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used.	`{}`

Returns:

Type	Description
`Self`	A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin.

Classes¶

`RBLNCosmosPipelineBaseConfig` ¶

Bases: RBLNModelConfig

Functions¶

`init(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)` ¶

Parameters:

Name	Type	Description	Default
`text_encoder`	`Optional[RBLNT5EncoderModelConfig]`	Configuration for the text encoder component. Initialized as RBLNT5EncoderModelConfig if not provided.	`None`
`transformer`	`Optional[RBLNCosmosTransformer3DModelConfig]`	Configuration for the Transformer model component. Initialized as RBLNCosmosTransformer3DModelConfig if not provided.	`None`
`vae`	`Optional[RBLNAutoencoderKLCosmosConfig]`	Configuration for the VAE model component. Initialized as RBLNAutoencoderKLCosmosConfig if not provided.	`None`
`safety_checker`	`Optional[RBLNCosmosSafetyCheckerConfig]`	Configuration for the safety checker component. Initialized as RBLNCosmosSafetyCheckerConfig if not provided.	`None`
`batch_size`	`Optional[int]`	Batch size for inference, applied to all submodules.	`None`
`height`	`Optional[int]`	Height of the generated videos.	`None`
`width`	`Optional[int]`	Width of the generated videos.	`None`
`num_frames`	`Optional[int]`	The number of frames in the generated video.	`None`
`fps`	`Optional[int]`	The frames per second of the generated video.	`None`
`max_seq_len`	`Optional[int]`	Maximum sequence length supported by the model.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the parent RBLNModelConfig.	`{}`

`RBLNCosmosTextToWorldPipelineConfig` ¶

Bases: RBLNCosmosPipelineBaseConfig

Config for Cosmos Text2World Pipeline

Functions¶

`init(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)` ¶

Parameters:

Name	Type	Description	Default
`text_encoder`	`Optional[RBLNT5EncoderModelConfig]`	Configuration for the text encoder component. Initialized as RBLNT5EncoderModelConfig if not provided.	`None`
`transformer`	`Optional[RBLNCosmosTransformer3DModelConfig]`	Configuration for the Transformer model component. Initialized as RBLNCosmosTransformer3DModelConfig if not provided.	`None`
`vae`	`Optional[RBLNAutoencoderKLCosmosConfig]`	Configuration for the VAE model component. Initialized as RBLNAutoencoderKLCosmosConfig if not provided.	`None`
`safety_checker`	`Optional[RBLNCosmosSafetyCheckerConfig]`	Configuration for the safety checker component. Initialized as RBLNCosmosSafetyCheckerConfig if not provided.	`None`
`batch_size`	`Optional[int]`	Batch size for inference, applied to all submodules.	`None`
`height`	`Optional[int]`	Height of the generated videos.	`None`
`width`	`Optional[int]`	Width of the generated videos.	`None`
`num_frames`	`Optional[int]`	The number of frames in the generated video.	`None`
`fps`	`Optional[int]`	The frames per second of the generated video.	`None`
`max_seq_len`	`Optional[int]`	Maximum sequence length supported by the model.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the parent RBLNModelConfig.	`{}`

`RBLNCosmosVideoToWorldPipelineConfig` ¶

Bases: RBLNCosmosPipelineBaseConfig

Config for Cosmos Video2World Pipeline

Functions¶

`init(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)` ¶

Parameters:

Name	Type	Description	Default
`text_encoder`	`Optional[RBLNT5EncoderModelConfig]`	Configuration for the text encoder component. Initialized as RBLNT5EncoderModelConfig if not provided.	`None`
`transformer`	`Optional[RBLNCosmosTransformer3DModelConfig]`	Configuration for the Transformer model component. Initialized as RBLNCosmosTransformer3DModelConfig if not provided.	`None`
`vae`	`Optional[RBLNAutoencoderKLCosmosConfig]`	Configuration for the VAE model component. Initialized as RBLNAutoencoderKLCosmosConfig if not provided.	`None`
`safety_checker`	`Optional[RBLNCosmosSafetyCheckerConfig]`	Configuration for the safety checker component. Initialized as RBLNCosmosSafetyCheckerConfig if not provided.	`None`
`batch_size`	`Optional[int]`	Batch size for inference, applied to all submodules.	`None`
`height`	`Optional[int]`	Height of the generated videos.	`None`
`width`	`Optional[int]`	Width of the generated videos.	`None`
`num_frames`	`Optional[int]`	The number of frames in the generated video.	`None`
`fps`	`Optional[int]`	The frames per second of the generated video.	`None`
`max_seq_len`	`Optional[int]`	Maximum sequence length supported by the model.	`None`
`**kwargs`	`Dict[str, Any]`	Additional arguments passed to the parent RBLNModelConfig.	`{}`

Cosmos¶

지원하는 파이프라인¶

주요 클래스¶

기본 동작¶

중요: Cosmos 가드레일 모델¶

사용 예제 (텍스트-비디오)¶

사용 예제 (비디오-비디오)¶

API 참조¶

Classes¶

RBLNCosmosTextToWorldPipeline ¶

Functions¶

from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs) classmethod ¶

Classes¶

RBLNCosmosVideoToWorldPipeline ¶

Functions¶

from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs) classmethod ¶

Classes¶

RBLNCosmosPipelineBaseConfig ¶

Functions¶

__init__(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs) ¶

RBLNCosmosTextToWorldPipelineConfig ¶

Functions¶

__init__(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs) ¶

RBLNCosmosVideoToWorldPipelineConfig ¶

Functions¶

__init__(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs) ¶

`RBLNCosmosTextToWorldPipeline` ¶

`from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)` `classmethod` ¶

`RBLNCosmosVideoToWorldPipeline` ¶

`from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)` `classmethod` ¶

`RBLNCosmosPipelineBaseConfig` ¶

`init(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)` ¶

`RBLNCosmosTextToWorldPipelineConfig` ¶

`init(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)` ¶

`RBLNCosmosVideoToWorldPipelineConfig` ¶

`init(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)` ¶