Sanggil Lee

100.0CVJun 1Code

Cosmos 3: Omnimodal World Models for Physical AI

Aditi, Niket Agarwal, Arslan Ali et al.

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 https://openmdw.ai/license/1-1/ License at https://github.com/nvidia/cosmos}{github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3 . The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3 .

CLApr 26, 2019

One-Shot Learning for Text-to-SQL Generation

Dongjun Lee, Jaesik Yoon, Jongyun Song et al.

Most deep learning approaches for text-to-SQL generation are limited to the WikiSQL dataset, which only supports very simple queries. Recently, template-based and sequence-to-sequence approaches were proposed to support complex queries, which contain join queries, nested queries, and other types. However, Finegan-Dollak et al. (2018) demonstrated that both the approaches lack the ability to generate SQL of unseen templates. In this paper, we propose a template-based one-shot learning model for the text-to-SQL generation so that the model can generate SQL of an untrained template based on a single example. First, we classify the SQL template using the Matching Network that is augmented by our novel architecture Candidate Search Network. Then, we fill the variable slots in the predicted template using the Pointer Network. We show that our model outperforms state-of-the-art approaches for various text-to-SQL datasets in two aspects: 1) the SQL generation accuracy for the trained templates, and 2) the adaptability to the unseen SQL templates based on a single example without any additional training.

Sanggil Lee

2 Papers