CVAILGRONov 2, 2023

Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

arXiv:2311.01017v4117 citationsh-index: 116
Originality Highly original
AI Analysis

This addresses the challenge of learning unsupervised world models for autonomous driving, representing a strong specific gain in robotic applications.

The paper tackles the problem of scaling world models for autonomous driving by proposing Copilot4D, which tokenizes sensor observations and predicts future states using discrete diffusion, achieving reductions in Chamfer distance of over 65% for 1-second and over 50% for 3-second predictions across multiple datasets.

Learning world models can teach an agent how the world works in an unsupervised manner. Even though it can be viewed as a special case of sequence modeling, progress for scaling world models on robotic applications such as autonomous driving has been somewhat less rapid than scaling language models with Generative Pre-trained Transformers (GPT). We identify two reasons as major bottlenecks: dealing with complex and unstructured observation space, and having a scalable generative model. Consequently, we propose Copilot4D, a novel world modeling approach that first tokenizes sensor observations with VQVAE, then predicts the future via discrete diffusion. To efficiently decode and denoise tokens in parallel, we recast Masked Generative Image Transformer as discrete diffusion and enhance it with a few simple changes, resulting in notable improvement. When applied to learning world models on point cloud observations, Copilot4D reduces prior SOTA Chamfer distance by more than 65% for 1s prediction, and more than 50% for 3s prediction, across NuScenes, KITTI Odometry, and Argoverse2 datasets. Our results demonstrate that discrete diffusion on tokenized agent experience can unlock the power of GPT-like unsupervised learning for robotics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes