CVNov 9, 2025

VDNeRF: Vision-only Dynamic Neural Radiance Field for Urban Scenes

arXiv:2511.06408v1h-index: 8
Originality Incremental advance
AI Analysis

This addresses challenges in autonomous driving and robotic perception by enabling accurate scene reconstruction without expensive sensors, though it is incremental as it builds on existing NeRF-based methods.

The authors tackled the problem of reconstructing dynamic urban scenes without known camera poses by proposing VDNeRF, which jointly estimates camera trajectories and learns spatiotemporal representations, achieving state-of-the-art performance in camera pose estimation and dynamic novel view synthesis on urban driving datasets.

Neural Radiance Fields (NeRFs) implicitly model continuous three-dimensional scenes using a set of images with known camera poses, enabling the rendering of photorealistic novel views. However, existing NeRF-based methods encounter challenges in applications such as autonomous driving and robotic perception, primarily due to the difficulty of capturing accurate camera poses and limitations in handling large-scale dynamic environments. To address these issues, we propose Vision-only Dynamic NeRF (VDNeRF), a method that accurately recovers camera trajectories and learns spatiotemporal representations for dynamic urban scenes without requiring additional camera pose information or expensive sensor data. VDNeRF employs two separate NeRF models to jointly reconstruct the scene. The static NeRF model optimizes camera poses and static background, while the dynamic NeRF model incorporates the 3D scene flow to ensure accurate and consistent reconstruction of dynamic objects. To address the ambiguity between camera motion and independent object motion, we design an effective and powerful training framework to achieve robust camera pose estimation and self-supervised decomposition of static and dynamic elements in a scene. Extensive evaluations on mainstream urban driving datasets demonstrate that VDNeRF surpasses state-of-the-art NeRF-based pose-free methods in both camera pose estimation and dynamic novel view synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes