CVAILGROMar 18, 2025

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

NVIDIA
arXiv:2503.14492v273 citationsh-index: 27Has Code
Originality Highly original
AI Analysis

This enables highly controllable world generation for applications such as robotics Sim2Real and autonomous vehicle data enrichment, representing a novel method for a known bottleneck.

The paper tackles the problem of generating world simulations with multiple spatial control inputs like segmentation and depth, achieving real-time world generation using an NVIDIA GB200 NVL72 rack.

We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly controllable world generation and finds use in various world-to-world transfer use cases, including Sim2Real. We conduct extensive evaluations to analyze the proposed model and demonstrate its applications for Physical AI, including robotics Sim2Real and autonomous vehicle data enrichment. We further demonstrate an inference scaling strategy to achieve real-time world generation with an NVIDIA GB200 NVL72 rack. To help accelerate research development in the field, we open-source our models and code at https://github.com/nvidia-cosmos/cosmos-transfer1.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes