CVLGIVJun 21, 2023

StarVQA+: Co-training Space-Time Attention for Video Quality Assessment

arXiv:2306.12298v12 citationsh-index: 26
Originality Incremental advance
AI Analysis

It addresses the challenge of evaluating video quality without pristine references for applications in video processing and streaming, representing an incremental improvement with a novel training approach.

The paper tackles video quality assessment (VQA) for in-the-wild videos by proposing StarVQA+, a co-trained space-time attention network, and demonstrates its superiority over state-of-the-art methods on multiple datasets.

Self-attention based Transformer has achieved great success in many computer vision tasks. However, its application to video quality assessment (VQA) has not been satisfactory so far. Evaluating the quality of in-the-wild videos is challenging due to the unknown of pristine reference and shooting distortion. This paper presents a co-trained Space-Time Attention network for the VQA problem, termed StarVQA+. Specifically, we first build StarVQA+ by alternately concatenating the divided space-time attention. Then, to facilitate the training of StarVQA+, we design a vectorized regression loss by encoding the mean opinion score (MOS) to the probability vector and embedding a special token as the learnable variable of MOS, leading to better fitting of human's rating process. Finally, to solve the data hungry problem with Transformer, we propose to co-train the spatial and temporal attention weights using both images and videos. Various experiments are conducted on the de-facto in-the-wild video datasets, including LIVE-Qualcomm, LIVE-VQC, KoNViD-1k, YouTube-UGC, LSVQ, LSVQ-1080p, and DVL2021. Experimental results demonstrate the superiority of the proposed StarVQA+ over the state-of-the-art.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes