Cosmos용 트랜스포머¶
RBLNCosmosTransformer3DModel
은 Cosmos World Foundation Models 모델에서 사용되는 핵심 트랜스포머 블록의 RBLN 최적화 버전입니다.
이 모델은 이전 Stable Diffusion 버전에서 사용된 UNet 아키텍처를 대체합니다. 텍스트와 비디오 인코더의 임베딩 및 타임스텝 정보와 함께 잠재 비디오 표현을 처리하여 확산 프로세스를 수행합니다.
주요 클래스¶
RBLNCosmosTransformer3DModel
: RBLN NPU 가속 Cosmos Transformer 모델.RBLNCosmosTransformer3DModelConfig
: RBLN Cosmos Transformer 모델의 설정 클래스.
파이프라인 내 사용법¶
일반적으로 RBLNCosmosTransformer3DModel
과 직접 상호 작용하지 않습니다. 대신, RBLNCosmosTextToWorldPipeline
및 RBLNCosmosVideoToWorldPipeline
과 같은 RBLN Cosmos 파이프라인의 일부로 자동 로드 및 관리됩니다.
RBLN Cosmos 파이프라인을 구성할 때 파이프라인 설정 객체의 transformer
인수를 통해 트랜스포머에 대한 특정 설정을 전달할 수 있습니다:
API 참조¶
Classes¶
RBLNCosmosTransformer3DModel
¶
Bases: RBLNModel
RBLN wrapper for the Cosmos Transformer model.
Functions¶
from_pretrained(model_id, export=False, rbln_config=None, **kwargs)
classmethod
¶
The from_pretrained()
function is utilized in its standard form as in the HuggingFace transformers library.
User can use this function to load a pre-trained model from the HuggingFace library and convert it to a RBLN model to be run on RBLN NPUs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_id
|
Union[str, Path]
|
The model id of the pre-trained model to be loaded. It can be downloaded from the HuggingFace model hub or a local path, or a model id of a compiled model using the RBLN Compiler. |
required |
export
|
bool
|
A boolean flag to indicate whether the model should be compiled. |
False
|
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
from_model(model, *, rbln_config=None, **kwargs)
classmethod
¶
Converts and compiles a pre-trained HuggingFace library model into a RBLN model. This method performs the actual model conversion and compilation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The PyTorch model to be compiled. The object must be an instance of the HuggingFace transformers PreTrainedModel class. |
required |
rbln_config
|
Optional[Union[Dict, RBLNModelConfig]]
|
Configuration for RBLN model compilation and runtime. This can be provided as a dictionary or an instance of the model's configuration class (e.g., |
None
|
kwargs
|
Dict[str, Any]
|
Additional keyword arguments. Arguments with the prefix 'rbln_' are passed to rbln_config, while the remaining arguments are passed to the HuggingFace library. |
{}
|
The method performs the following steps:
- Compiles the PyTorch model into an optimized RBLN graph
- Configures the model for the specified NPU device
- Creates the necessary runtime objects if requested
- Saves the compiled model and configurations
Returns:
Type | Description |
---|---|
Self
|
A RBLN model instance ready for inference on RBLN NPU devices. |
save_pretrained(save_directory)
¶
Saves a model and its configuration file to a directory, so that it can be re-loaded using the
[from_pretrained
] class method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
save_directory
|
Union[str, PathLike]
|
The directory to save the model and its configuration files. Will be created if it doesn't exist. |
required |
Classes¶
RBLNCosmosTransformer3DModelConfig
¶
Bases: RBLNModelConfig
Configuration class for RBLN Cosmos Transformer models.
Functions¶
__init__(batch_size=None, num_frames=None, height=None, width=None, fps=None, max_seq_len=None, embedding_dim=None, num_channels_latents=None, num_latent_frames=None, latent_height=None, latent_width=None, **kwargs)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
Optional[int]
|
The batch size for inference. Defaults to 1. |
None
|
num_frames
|
Optional[int]
|
The number of frames in the generated video. Defaults to 121. |
None
|
height
|
Optional[int]
|
The height in pixels of the generated video. Defaults to 704. |
None
|
width
|
Optional[int]
|
The width in pixels of the generated video. Defaults to 1280. |
None
|
fps
|
Optional[int]
|
The frames per second of the generated video. Defaults to 30. |
None
|
max_seq_len
|
Optional[int]
|
Maximum sequence length of prompt embeds. |
None
|
embedding_dim
|
Optional[int]
|
Embedding vector dimension of prompt embeds. |
None
|
num_channels_latents
|
Optional[int]
|
The number of channels in latent space. |
None
|
latent_height
|
Optional[int]
|
The height in pixels in latent space. |
None
|
latent_width
|
Optional[int]
|
The width in pixels in latent space. |
None
|
**kwargs
|
Dict[str, Any]
|
Additional arguments passed to the parent RBLNModelConfig. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If batch_size is not a positive integer. |