CVJun 24, 2025

Self-Supervised Multimodal NeRF for Autonomous Driving

arXiv:2506.19615v2Has Code2025 IEEE Intelligent Vehicles Symposium (IV)
Originality Incremental advance
AI Analysis

This work addresses the challenge of multimodal scene understanding for autonomous vehicles, offering a self-supervised approach that is incremental over existing dynamic NeRF methods.

The paper tackles the problem of learning neural representations for both static and dynamic scenes in autonomous driving without requiring 3D labels, achieving state-of-the-art performance on the KITTI-360 dataset for LiDAR and camera domains.

In this paper, we propose a Neural Radiance Fields (NeRF) based framework, referred to as Novel View Synthesis Framework (NVSF). It jointly learns the implicit neural representation of space and time-varying scene for both LiDAR and Camera. We test this on a real-world autonomous driving scenario containing both static and dynamic scenes. Compared to existing multimodal dynamic NeRFs, our framework is self-supervised, thus eliminating the need for 3D labels. For efficient training and faster convergence, we introduce heuristic-based image pixel sampling to focus on pixels with rich information. To preserve the local features of LiDAR points, a Double Gradient based mask is employed. Extensive experiments on the KITTI-360 dataset show that, compared to the baseline models, our framework has reported best performance on both LiDAR and Camera domain. Code of the model is available at https://github.com/gaurav00700/Selfsupervised-NVSF

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes