LGRONov 20, 2023

MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations

arXiv:2311.11762v438 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more actionable and robust world models in autonomous driving, though it appears incremental by building on existing sensor fusion and occupancy prediction techniques.

The authors tackled the problem of improving world models for autonomous driving by combining multimodal sensor data (camera and lidar) with 3D occupancy predictions, finding that this approach enhances sensor data prediction and reveals weaknesses in current fusion methods.

World models for autonomous driving have the potential to dramatically improve the reasoning capabilities of today's systems. However, most works focus on camera data, with only a few that leverage lidar data or combine both to better represent autonomous vehicle sensor setups. In addition, raw sensor predictions are less actionable than 3D occupancy predictions, but there are no works examining the effects of combining both multimodal sensor data and 3D occupancy prediction. In this work, we perform a set of experiments with a MUltimodal World Model with Geometric VOxel representations (MUVO) to evaluate different sensor fusion strategies to better understand the effects on sensor data prediction. We also analyze potential weaknesses of current sensor fusion approaches and examine the benefits of additionally predicting 3D occupancy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes