IVCVMay 19

FGSVQA: Frequency-Guided Short-form Video Quality Assessment

arXiv:2605.2001664.9Has Code
Predicted impact top 8% in IV · last 90 daysOriginality Incremental advance
AI Analysis

It addresses the need for accurate quality assessment of short-form user-generated content, which is challenging due to complex distortions and rapid content variation.

The paper tackles short-form video quality assessment by proposing a CLIP-based framework that uses frequency-domain compression priors to generate artifact- and structure-aware weight maps, achieving SRCC of 0.736 and PLCC of 0.787 on short-form video datasets.

Short-form video poses new challenges to the quality assessment of user-generated content (UGC) due to its complex generation pipeline, rapid content variation, and mixed distortions. To address this challenge, we propose an end-to-end video quality assessment (VQA) framework that employs a dense visual encoder based on CLIP, and incorporates compression priors derived from the frequency domain to generate artifact- and structure-aware weight maps for feature aggregation. By explicitly decomposing artifact, structure, and original visual feature branches and adaptively fusing them over time through a learned gating module, the proposed method achieves accurate and efficient quality prediction. Experimental results show that our method achieves strong performance on short-form video datasets in terms of average rank and linear correlation (SRCC: 0.736, PLCC: 0.787), while maintaining efficient inference runtime. The code and additional results are available at: https://github.com/xinyiW915/FGSVQA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes