Towards a Perceived Audiovisual Quality Model for Immersive Content
This work addresses the need for better quality assessment models in immersive multimedia, but it is incremental as it builds on existing metrics and highlights the need for improved databases and testing methods.
The paper tackled the problem of predicting perceived quality for immersive 360-degree video and spatial audio content by conducting subjective experiments and correlating them with objective metrics like PSNR variants. It found that a Cross-Format SPSNR-NN model had slightly higher correlation for video, and a power model showed the highest correlation for audiovisual quality.
This paper studies the quality of multimedia content focusing on 360 video and ambisonic spatial audio reproduced using a head-mounted display and a multichannel loudspeaker setup. Encoding parameters following basic video quality test conditions for 360 videos were selected and a low-bitrate codec was used for the audio encoder. Three subjective experiments were performed for the audio, video, and audiovisual respectively. Peak signal-to-noise ratio (PSNR) and its variants for 360 videos were computed to obtain objective quality metrics and subsequently correlated with the subjective video scores. This study shows that a Cross-Format SPSNR-NN has a slightly higher linear and monotonic correlation over all video sequences. Based on the audiovisual model, a power model shows a highest correlation between test data and predicted scores. We concluded that to enable the development of superior predictive model, a high quality, critical, synchronized audiovisual database is required. Furthermore, comprehensive assessor training may be beneficial prior to the testing to improve the assessors' discrimination ability particularly with respect to multichannel audio reproduction. In order to further improve the performance of audiovisual quality models for immersive content, in addition to developing broader and critical audiovisual databases, the subjective testing methodology needs to be evolved to provide greater resolution and robustness.