CVApr 28, 2025

Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video

Hoang Chuong Nguyen, Wei Mao, Jose M. Alvarez, Miaomiao Liu

arXiv:2504.19819v16.21 citationsh-index: 2Has CodeCVPR

Originality Incremental advance

AI Analysis

This addresses the challenge of 3D scene reconstruction from monocular videos for applications in computer vision and graphics, though it appears incremental as it builds on existing joint optimization methods.

The paper tackles the problem of training Neural Radiance Fields (NeRF) without requiring precomputed camera poses by proposing a method that models continuous camera motions as time-dependent velocities, eliminating dependencies on pose initialization or depth priors. The approach achieves superior camera pose and depth estimation on Co3D and Scannet datasets while maintaining comparable novel-view synthesis performance to state-of-the-art methods.

Neural Radiance Fields (NeRF) has demonstrated its superior capability to represent 3D geometry but require accurately precomputed camera poses during training. To mitigate this requirement, existing methods jointly optimize camera poses and NeRF often relying on good pose initialisation or depth priors. However, these approaches struggle in challenging scenarios, such as large rotations, as they map each camera to a world coordinate system. We propose a novel method that eliminates prior dependencies by modeling continuous camera motions as time-dependent angular velocity and velocity. Relative motions between cameras are learned first via velocity integration, while camera poses can be obtained by aggregating such relative motions up to a world coordinate system defined at a single time step within the video. Specifically, accurate continuous camera movements are learned through a time-dependent NeRF, which captures local scene geometry and motion by training from neighboring frames for each time step. The learned motions enable fine-tuning the NeRF to represent the full scene geometry. Experiments on Co3D and Scannet show our approach achieves superior camera pose and depth estimation and comparable novel-view synthesis performance compared to state-of-the-art methods. Our code is available at https://github.com/HoangChuongNguyen/cope-nerf.

View on arXiv PDF Code

Similar