Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views
This work addresses the problem of evaluating perceived video quality for NVS methods, which is crucial for researchers and developers in computer vision and graphics, though it is incremental as it focuses on assessment rather than method innovation.
The study tackled the lack of perceptual quality assessment for neural view synthesis (NVS) and NeRF methods by conducting the first perceptual evaluation using controlled and in-the-wild datasets with reference videos, analyzing temporal artifacts and distortions, and providing recommendations for dataset and metric selection.
Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.