Skip to content

Cosmos

Cosmos World Foundation Models are specialized to generate videos and world states that accurately adhere to the laws of physics. Using diffusion-based technology, these models create dynamic, high-quality videos from various inputs like text, images, or videos. RBLN NPUs can accelerate Cosmos pipelines using Optimum RBLN.

Supported Pipelines

Optimum RBLN supports several Cosmos pipelines:

  • Text-to-Video: Generate high-quality videos from text prompts.
  • Video-to-Video: Generate high-quality videos from input videos and text prompts.

Key Classes

Default Behavior

The Cosmos pipeline includes a guardrail model, RBLNSafetyChecker, which performs the following three roles:

  • Input Safety: It checks the user's text prompt for inappropriate language.
  • Output Safety: It checks the final generated video for any unsafe content.
  • Facial Blurring: It automatically pixelates any facial areas found in the generated video.

Important: Cosmos Safety Guardrails

NVIDIA Open Model License

According to NVIDIA Open Model License policy, your rights will automatically terminate if you bypass, disable, weaken, or otherwise circumvent the Cosmos safety guardrails.

Usage Example (Text-to-Video)

from diffusers.utils import export_to_video
from optimum.rbln import RBLNCosmosTextToWorldPipeline, RBLNCosmosTextToWorldPipelineConfig

# Create a configuration object (optional, can use defaults)
config = RBLNCosmosTextToWorldPipelineConfig(
    height=704,
    width=1280,
    transformer={
        "tensor_parallel_size": 4,
        "device": [0, 1, 2, 3],
    },
    text_encoder={
        "device": 2,
    },
    vae={
        "device": 3,
    },
    safety_checker={
        "aegis": {
            "tensor_parallel_size": 4,
            "device": [4, 5, 6, 7],
            },
        "siglip_encoder": {"device": 4},
        "video_safety_model": {"device": 4},
        "face_blur_filter": {"device": 4},
    }
)

# Load and compile the Cosmos model for RBLN NPU
pipe = RBLNCosmosTextToWorldPipeline.from_pretrained(
    "nvidia/Cosmos-1.0-Diffusion-7B-Text2World",
    export=True,
    rbln_config=config,
)

# Generate a Video
prompt = "A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
output = pipe(prompt=prompt).frames[0]

# Save the generated video
export_to_video(output, "output.mp4", fps=30)
print("Video saved as output.mp4")

Usage Example (Video-to-Video)

from diffusers.utils import export_to_video, load_video
from optimum.rbln import RBLNCosmosVideoToWorldPipeline, RBLNCosmosVideoToWorldPipelineConfig

# Create a configuration object (optional, can use defaults)
config = RBLNCosmosVideoToWorldPipelineConfig(
    height=704,
    width=1280,
    transformer={
        "tensor_parallel_size": 4,
        "device": [0, 1, 2, 3],
    },
    text_encoder={
        "device": 4,
    },
    vae={
        "device_map": {"encoder": 5, "decoder": 6},
    },
    safety_checker={
        "aegis": {
            "tensor_parallel_size": 4,
            "device": [4, 5, 6, 7]
            },
        "siglip_encoder": {"device": 7},
        "video_safety_model": {"device": 7},
        "face_blur_filter": {"device": 7},
    }
)

# Load and compile the Cosmos model for RBLN NPU
pipe = RBLNCosmosVideoToWorldPipeline.from_pretrained(
    "nvidia/Cosmos-1.0-Diffusion-7B-Video2World",
    export=True,
    rbln_config=config,
)

# Generate a Video
video = load_video("https://github.com/nvidia-cosmos/cosmos-predict1/raw/refs/heads/main/assets/diffusion/video2world_input1.mp4")
prompt = "A dynamic and visually captivating video showcases a sleek, dark-colored SUV driving along a narrow dirt road that runs parallel to a vast, expansive ocean. The setting is a rugged coastal landscape, with the road cutting through dry, golden-brown grass that stretches across rolling hills. The ocean, a deep blue, extends to the horizon, providing a stunning backdrop to the scene. The SUV moves swiftly along the road, kicking up a trail of dust that lingers in the air behind it, emphasizing the speed and power of the vehicle. The camera maintains a steady tracking shot, following the SUV from a slightly elevated angle, which allows for a clear view of both the vehicle and the surrounding scenery. The lighting is natural, suggesting a time of day when the sun is high, casting minimal shadows and highlighting the textures of the grass and the glint of the ocean. The video captures the essence of freedom and adventure, with the SUV navigating the isolated road with ease, suggesting a journey or exploration theme. The consistent motion of the vehicle and the dust trail create a sense of continuity and fluidity throughout the video, making it engaging and immersive."
output = pipe(video=video, prompt=prompt).frames[0]

# Save the generated video
export_to_video(output, "output.mp4", fps=30)
print("Video saved as output.mp4")

API Reference

Classes

RBLNCosmosTextToWorldPipeline

Bases: RBLNDiffusionMixin, CosmosTextToWorldPipeline

RBLN-accelerated implementation of Cosmos Text to World pipeline for text-to-video generation.

This pipeline compiles Cosmos Text to World models to run efficiently on RBLN NPUs, enabling high-performance inference for generating videos with distinctive artistic style and enhanced visual quality.

Functions

from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs) classmethod

Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.

This method has two distinct operating modes:

  • When export=True: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model
  • When export=False: Loads an already compiled RBLN model from model_id without recompilation

It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.

Parameters:

Name Type Description Default
model_id str

The model ID or path to the pretrained model to load. Can be either:

  • A model ID from the HuggingFace Hub
  • A local path to a saved model directory
required
export bool

If True, takes a PyTorch model from model_id and compiles it for RBLN NPU execution. If False, loads an already compiled RBLN model from model_id without recompilation.

False
model_save_dir Optional[PathLike]

Directory to save the compiled model artifacts. Only used when export=True. If not provided and export=True, a temporary directory is used.

None
rbln_config Dict[str, Any]

Configuration options for RBLN compilation. Can include settings for specific submodules such as text_encoder, unet, and vae. Configuration can be tailored to the specific pipeline being compiled.

{}
lora_ids Optional[Union[str, List[str]]]

LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused into the model weights during compilation. Only used when export=True.

None
lora_weights_names Optional[Union[str, List[str]]]

Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when export=True.

None
lora_scales Optional[Union[float, List[float]]]

Scaling factor(s) to apply to the LoRA adapter(s). Only used when export=True.

None
**kwargs Dict[str, Any]

Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used.

{}

Returns:

Type Description
Self

A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin.

Classes

RBLNCosmosVideoToWorldPipeline

Bases: RBLNDiffusionMixin, CosmosVideoToWorldPipeline

RBLN-accelerated implementation of Cosmos Video to World pipeline for video-to-video generation.

This pipeline compiles Cosmos Video to World models to run efficiently on RBLN NPUs, enabling high-performance inference for generating videos with distinctive artistic style and enhanced visual quality.

Functions

from_pretrained(model_id, *, export=False, model_save_dir=None, rbln_config={}, lora_ids=None, lora_weights_names=None, lora_scales=None, **kwargs) classmethod

Load a pretrained diffusion pipeline from a model checkpoint, with optional compilation for RBLN NPUs.

This method has two distinct operating modes:

  • When export=True: Takes a PyTorch-based diffusion model, compiles it for RBLN NPUs, and loads the compiled model
  • When export=False: Loads an already compiled RBLN model from model_id without recompilation

It supports various diffusion pipelines including Stable Diffusion, Kandinsky, ControlNet, and other diffusers-based models.

Parameters:

Name Type Description Default
model_id str

The model ID or path to the pretrained model to load. Can be either:

  • A model ID from the HuggingFace Hub
  • A local path to a saved model directory
required
export bool

If True, takes a PyTorch model from model_id and compiles it for RBLN NPU execution. If False, loads an already compiled RBLN model from model_id without recompilation.

False
model_save_dir Optional[PathLike]

Directory to save the compiled model artifacts. Only used when export=True. If not provided and export=True, a temporary directory is used.

None
rbln_config Dict[str, Any]

Configuration options for RBLN compilation. Can include settings for specific submodules such as text_encoder, unet, and vae. Configuration can be tailored to the specific pipeline being compiled.

{}
lora_ids Optional[Union[str, List[str]]]

LoRA adapter ID(s) to load and apply before compilation. LoRA weights are fused into the model weights during compilation. Only used when export=True.

None
lora_weights_names Optional[Union[str, List[str]]]

Names of specific LoRA weight files to load, corresponding to lora_ids. Only used when export=True.

None
lora_scales Optional[Union[float, List[float]]]

Scaling factor(s) to apply to the LoRA adapter(s). Only used when export=True.

None
**kwargs Dict[str, Any]

Additional arguments to pass to the underlying diffusion pipeline constructor or the RBLN compilation process. These may include parameters specific to individual submodules or the particular diffusion pipeline being used.

{}

Returns:

Type Description
Self

A compiled diffusion pipeline that can be used for inference on RBLN NPU. The returned object is an instance of the class that called this method, inheriting from RBLNDiffusionMixin.

Classes

RBLNCosmosPipelineBaseConfig

Bases: RBLNModelConfig

Functions

__init__(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)

Parameters:

Name Type Description Default
text_encoder Optional[RBLNT5EncoderModelConfig]

Configuration for the text encoder component. Initialized as RBLNT5EncoderModelConfig if not provided.

None
transformer Optional[RBLNCosmosTransformer3DModelConfig]

Configuration for the Transformer model component. Initialized as RBLNCosmosTransformer3DModelConfig if not provided.

None
vae Optional[RBLNAutoencoderKLCosmosConfig]

Configuration for the VAE model component. Initialized as RBLNAutoencoderKLCosmosConfig if not provided.

None
safety_checker Optional[RBLNCosmosSafetyCheckerConfig]

Configuration for the safety checker component. Initialized as RBLNCosmosSafetyCheckerConfig if not provided.

None
batch_size Optional[int]

Batch size for inference, applied to all submodules.

None
height Optional[int]

Height of the generated videos.

None
width Optional[int]

Width of the generated videos.

None
num_frames Optional[int]

The number of frames in the generated video.

None
fps Optional[int]

The frames per second of the generated video.

None
max_seq_len Optional[int]

Maximum sequence length supported by the model.

None
**kwargs Dict[str, Any]

Additional arguments passed to the parent RBLNModelConfig.

{}

RBLNCosmosTextToWorldPipelineConfig

Bases: RBLNCosmosPipelineBaseConfig

Config for Cosmos Text2World Pipeline

Functions

__init__(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)

Parameters:

Name Type Description Default
text_encoder Optional[RBLNT5EncoderModelConfig]

Configuration for the text encoder component. Initialized as RBLNT5EncoderModelConfig if not provided.

None
transformer Optional[RBLNCosmosTransformer3DModelConfig]

Configuration for the Transformer model component. Initialized as RBLNCosmosTransformer3DModelConfig if not provided.

None
vae Optional[RBLNAutoencoderKLCosmosConfig]

Configuration for the VAE model component. Initialized as RBLNAutoencoderKLCosmosConfig if not provided.

None
safety_checker Optional[RBLNCosmosSafetyCheckerConfig]

Configuration for the safety checker component. Initialized as RBLNCosmosSafetyCheckerConfig if not provided.

None
batch_size Optional[int]

Batch size for inference, applied to all submodules.

None
height Optional[int]

Height of the generated videos.

None
width Optional[int]

Width of the generated videos.

None
num_frames Optional[int]

The number of frames in the generated video.

None
fps Optional[int]

The frames per second of the generated video.

None
max_seq_len Optional[int]

Maximum sequence length supported by the model.

None
**kwargs Dict[str, Any]

Additional arguments passed to the parent RBLNModelConfig.

{}

RBLNCosmosVideoToWorldPipelineConfig

Bases: RBLNCosmosPipelineBaseConfig

Config for Cosmos Video2World Pipeline

Functions

__init__(text_encoder=None, transformer=None, vae=None, safety_checker=None, *, batch_size=None, height=None, width=None, num_frames=None, fps=None, max_seq_len=None, **kwargs)

Parameters:

Name Type Description Default
text_encoder Optional[RBLNT5EncoderModelConfig]

Configuration for the text encoder component. Initialized as RBLNT5EncoderModelConfig if not provided.

None
transformer Optional[RBLNCosmosTransformer3DModelConfig]

Configuration for the Transformer model component. Initialized as RBLNCosmosTransformer3DModelConfig if not provided.

None
vae Optional[RBLNAutoencoderKLCosmosConfig]

Configuration for the VAE model component. Initialized as RBLNAutoencoderKLCosmosConfig if not provided.

None
safety_checker Optional[RBLNCosmosSafetyCheckerConfig]

Configuration for the safety checker component. Initialized as RBLNCosmosSafetyCheckerConfig if not provided.

None
batch_size Optional[int]

Batch size for inference, applied to all submodules.

None
height Optional[int]

Height of the generated videos.

None
width Optional[int]

Width of the generated videos.

None
num_frames Optional[int]

The number of frames in the generated video.

None
fps Optional[int]

The frames per second of the generated video.

None
max_seq_len Optional[int]

Maximum sequence length supported by the model.

None
**kwargs Dict[str, Any]

Additional arguments passed to the parent RBLNModelConfig.

{}