CV AISep 17, 2024

RenderWorld: World Model with Self-Supervised 3D Label

Ziyang Yan, Wenzhen Dong, Yihua Shao, Yuhang Lu, Liu Haiyang, Jingwen Liu, Haozhe Wang, Zhe Wang, Yan Wang, Fabio Remondino, Yuexin Ma

arXiv:2409.11356v225.051 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the need for cost-effective and reliable autonomous driving systems, though it appears incremental as it builds on existing methods like Gaussian Splatting and world models.

The paper tackles the problem of vision-only autonomous driving by proposing RenderWorld, a framework that uses self-supervised 3D occupancy labels and a world model for forecasting and planning, achieving state-of-the-art performance in 4D occupancy forecasting and motion planning.

End-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods. To achieve a economical and robust purely visual autonomous driving system, we propose RenderWorld, a vision-only end-to-end autonomous driving framework, which generates 3D occupancy labels using a self-supervised gaussian-based Img2Occ Module, then encodes the labels by AM-VAE, and uses world model for forecasting and planning. RenderWorld employs Gaussian Splatting to represent 3D scenes and render 2D images greatly improves segmentation accuracy and reduces GPU memory consumption compared with NeRF-based methods. By applying AM-VAE to encode air and non-air separately, RenderWorld achieves more fine-grained scene element representation, leading to state-of-the-art performance in both 4D occupancy forecasting and motion planning from autoregressive world model.

View on arXiv PDF

Similar