LG RONov 20, 2023

MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations

Daniel Bogdoll, Yitian Yang, Tim Joseph, Melih Yazgan, J. Marius Zöllner

arXiv:2311.11762v422.038 citationsh-index: 10Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more actionable and robust world models in autonomous driving, though it appears incremental by building on existing sensor fusion and occupancy prediction techniques.

The authors tackled the problem of improving world models for autonomous driving by combining multimodal sensor data (camera and lidar) with 3D occupancy predictions, finding that this approach enhances sensor data prediction and reveals weaknesses in current fusion methods.

World models for autonomous driving have the potential to dramatically improve the reasoning capabilities of today's systems. However, most works focus on camera data, with only a few that leverage lidar data or combine both to better represent autonomous vehicle sensor setups. In addition, raw sensor predictions are less actionable than 3D occupancy predictions, but there are no works examining the effects of combining both multimodal sensor data and 3D occupancy prediction. In this work, we perform a set of experiments with a MUltimodal World Model with Geometric VOxel representations (MUVO) to evaluate different sensor fusion strategies to better understand the effects on sensor data prediction. We also analyze potential weaknesses of current sensor fusion approaches and examine the benefits of additionally predicting 3D occupancy.

View on arXiv PDF Code

Similar