Cosmos¶
Cosmos World Foundation Models are specialized to generate videos and world states that accurately adhere to the laws of physics. Using diffusion-based technology, these models create dynamic, high-quality videos from various inputs like text, images, or videos. RBLN NPUs can accelerate Cosmos pipelines using Optimum RBLN.
Supported Pipelines¶
Optimum RBLN supports several Cosmos pipelines:
- Text-to-Video: Generate high-quality videos from text prompts.
- Video-to-Video: Generate high-quality videos from input videos and text prompts.
Key Classes¶
RBLNCosmosTextToWorldPipeline
: Text-to-video pipeline for Cosmos.RBLNCosmosTextToWorldPipelineConfig
: Configuration for the text-to-video pipeline.RBLNCosmosVideoToWorldPipeline
: Video-to-video pipeline for Cosmos.RBLNCosmosVideoToWorldPipelineConfig
: Configuration for the video-to-video pipeline.
Default Behavior¶
The Cosmos pipeline includes a guardrail model, RBLNSafetyChecker
, which performs the following three roles:
- Input Safety: It checks the user's text prompt for inappropriate language.
- Output Safety: It checks the final generated video for any unsafe content.
- Facial Blurring: It automatically pixelates any facial areas found in the generated video.
Important: Cosmos Safety Guardrails¶
NVIDIA Open Model License
According to NVIDIA Open Model License policy, your rights will automatically terminate if you bypass, disable, weaken, or otherwise circumvent the Cosmos safety guardrails.
Usage Example (Text-to-Video)¶
Usage Example (Video-to-Video)¶
API Reference¶
Classes¶
RBLNCosmosTextToWorldPipeline
¶
Bases: RBLNDiffusionMixin
, CosmosTextToWorldPipeline
RBLN-accelerated implementation of Cosmos Text to World pipeline for text-to-video generation.
This pipeline compiles Cosmos Text to World models to run efficiently on RBLN NPUs, enabling high-performance inference for generating videos with distinctive artistic style and enhanced visual quality.
Functions¶
from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
This method has two distinct operating modes:
- When
export=True
: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model - When
export=False
: Loads an already compiled RBLN model frommodel_id
without recompilation
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
False
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Classes¶
RBLNCosmosVideoToWorldPipeline
¶
Bases: RBLNDiffusionMixin
, CosmosVideoToWorldPipeline
RBLN-accelerated implementation of Cosmos Video to World pipeline for video-to-video generation.
This pipeline compiles Cosmos Video to World models to run efficiently on RBLN NPUs, enabling high-performance inference for generating videos with distinctive artistic style and enhanced visual quality.
Functions¶
from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs)
classmethod
¶
Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.
This method has two distinct operating modes:
- When
export=True
: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model - When
export=False
: Loads an already compiled RBLN model frommodel_id
without recompilation
It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
str
|
The model ID or path to the pretrained model to load. Can be either:
|
required |
export
|
bool
|
If True, takes a PyTorch model from |
False
|
model_save_dir
|
Optional[PathLike]
|
Directory to save the compiled model artifacts. Only used when |
None
|
rbln_config
|
Dict[str, Any]
|
Configuration options for RBLN compilation. Can include settings for specific submodules
such as |
{}
|
lora_ids
|
Optional[Union[str, List[str]]]
|
LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused
into the model weights during compilation. Only used when |
None
|
lora_weights_names
|
Optional[Union[str, List[str]]]
|
Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when |
None
|
lora_scales
|
Optional[Union[float, List[float]]]
|
Scaling factor(s) to apply to the LoRA adapter(s). Only used when |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin. |
Classes¶
RBLNCosmosPipelineBaseConfig
¶
Bases: RBLNModelConfig
Functions¶
__init__(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text_encoder
|
Optional[RBLNT5EncoderModelConfig]
|
Configuration for the text encoder component. Initialized as RBLNT5EncoderModelConfig if not provided. |
None
|
transformer
|
Optional[RBLNCosmosTransformer3DModelConfig]
|
Configuration for the Transformer model component. Initialized as RBLNCosmosTransformer3DModelConfig if not provided. |
None
|
vae
|
Optional[RBLNAutoencoderKLCosmosConfig]
|
Configuration for the VAE model component. Initialized as RBLNAutoencoderKLCosmosConfig if not provided. |
None
|
safety_checker
|
Optional[RBLNCosmosSafetyCheckerConfig]
|
Configuration for the safety checker component. Initialized as RBLNCosmosSafetyCheckerConfig if not provided. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
height
|
Optional[int]
|
Height of the generated videos. |
None
|
width
|
Optional[int]
|
Width of the generated videos. |
None
|
num_frames
|
Optional[int]
|
The number of frames in the generated video. |
None
|
fps
|
Optional[int]
|
The frames per second of the generated video. |
None
|
max_seq_len
|
Optional[int]
|
Maximum sequence length supported by the model. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
RBLNCosmosTextToWorldPipelineConfig
¶
Bases: RBLNCosmosPipelineBaseConfig
Config for Cosmos Text2World Pipeline
Functions¶
__init__(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text_encoder
|
Optional[RBLNT5EncoderModelConfig]
|
Configuration for the text encoder component. Initialized as RBLNT5EncoderModelConfig if not provided. |
None
|
transformer
|
Optional[RBLNCosmosTransformer3DModelConfig]
|
Configuration for the Transformer model component. Initialized as RBLNCosmosTransformer3DModelConfig if not provided. |
None
|
vae
|
Optional[RBLNAutoencoderKLCosmosConfig]
|
Configuration for the VAE model component. Initialized as RBLNAutoencoderKLCosmosConfig if not provided. |
None
|
safety_checker
|
Optional[RBLNCosmosSafetyCheckerConfig]
|
Configuration for the safety checker component. Initialized as RBLNCosmosSafetyCheckerConfig if not provided. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
height
|
Optional[int]
|
Height of the generated videos. |
None
|
width
|
Optional[int]
|
Width of the generated videos. |
None
|
num_frames
|
Optional[int]
|
The number of frames in the generated video. |
None
|
fps
|
Optional[int]
|
The frames per second of the generated video. |
None
|
max_seq_len
|
Optional[int]
|
Maximum sequence length supported by the model. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
RBLNCosmosVideoToWorldPipelineConfig
¶
Bases: RBLNCosmosPipelineBaseConfig
Config for Cosmos Video2World Pipeline
Functions¶
__init__(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text_encoder
|
Optional[RBLNT5EncoderModelConfig]
|
Configuration for the text encoder component. Initialized as RBLNT5EncoderModelConfig if not provided. |
None
|
transformer
|
Optional[RBLNCosmosTransformer3DModelConfig]
|
Configuration for the Transformer model component. Initialized as RBLNCosmosTransformer3DModelConfig if not provided. |
None
|
vae
|
Optional[RBLNAutoencoderKLCosmosConfig]
|
Configuration for the VAE model component. Initialized as RBLNAutoencoderKLCosmosConfig if not provided. |
None
|
safety_checker
|
Optional[RBLNCosmosSafetyCheckerConfig]
|
Configuration for the safety checker component. Initialized as RBLNCosmosSafetyCheckerConfig if not provided. |
None
|
batch_size
|
Optional[int]
|
Batch size for inference, applied to all submodules. |
None
|
height
|
Optional[int]
|
Height of the generated videos. |
None
|
width
|
Optional[int]
|
Width of the generated videos. |
None
|
num_frames
|
Optional[int]
|
The number of frames in the generated video. |
None
|
fps
|
Optional[int]
|
The frames per second of the generated video. |
None
|
max_seq_len
|
Optional[int]
|
Maximum sequence length supported by the model. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|