CVIVJun 12, 2025

Research on Audio-Visual Quality Assessment Dataset and Method for User-Generated Omnidirectional Video

arXiv:2506.10331v1h-index: 4ICME
Originality Synthesis-oriented
AI Analysis

This addresses the need for quality assessment tools in the Metaverse for UGC-ODVs, but it is incremental as it builds on existing AVQA methods by applying them to a new dataset.

The researchers tackled the lack of audio-visual quality assessment (AVQA) for user-generated omnidirectional videos (UGC-ODVs) by constructing a dataset of 300 videos captured with omnidirectional cameras and developing a baseline model, which achieved optimal performance on this dataset.

In response to the rising prominence of the Metaverse, omnidirectional videos (ODVs) have garnered notable interest, gradually shifting from professional-generated content (PGC) to user-generated content (UGC). However, the study of audio-visual quality assessment (AVQA) within ODVs remains limited. To address this, we construct a dataset of UGC omnidirectional audio and video (A/V) content. The videos are captured by five individuals using two different types of omnidirectional cameras, shooting 300 videos covering 10 different scene types. A subjective AVQA experiment is conducted on the dataset to obtain the Mean Opinion Scores (MOSs) of the A/V sequences. After that, to facilitate the development of UGC-ODV AVQA fields, we construct an effective AVQA baseline model on the proposed dataset, of which the baseline model consists of video feature extraction module, audio feature extraction and audio-visual fusion module. The experimental results demonstrate that our model achieves optimal performance on the proposed dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes