CVOct 29, 2023

DynPoint: Dynamic Neural Point For View Synthesis

Kaichen Zhou, Jia-Xing Zhong, Sangyun Shin, Kai Lu, Yiyuan Yang, Andrew Markham, Niki Trigoni

arXiv:2310.18999v519.041 citationsh-index: 49

Originality Incremental advance

AI Analysis

This addresses the need for efficient and robust view synthesis in dynamic, real-world video scenarios, though it is incremental as it builds on existing neural radiance field methods.

The authors tackled the problem of slow training and poor handling of unconstrained or long monocular videos in neural radiance field-based view synthesis by proposing DynPoint, which predicts 3D correspondences to aggregate information, resulting in an order of magnitude faster training time while maintaining comparable accuracy.

The introduction of neural radiance fields has greatly improved the effectiveness of view synthesis for monocular videos. However, existing algorithms face difficulties when dealing with uncontrolled or lengthy scenarios, and require extensive training time specific to each new scenario. To tackle these limitations, we propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos. Rather than encoding the entirety of the scenario information into a latent representation, DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation. Specifically, this correspondence prediction is achieved through the estimation of consistent depth and scene flow information across frames. Subsequently, the acquired correspondence is utilized to aggregate information from multiple reference frames to a target frame, by constructing hierarchical neural point clouds. The resulting framework enables swift and accurate view synthesis for desired views of target frames. The experimental results obtained demonstrate the considerable acceleration of training time achieved - typically an order of magnitude - by our proposed method while yielding comparable outcomes compared to prior approaches. Furthermore, our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.

View on arXiv PDF

Similar