Learning Perceptual Representations for Gaming NR-VQA with Multi-Task FR Signals
This addresses video quality assessment for gaming content, which has unique challenges like fast motion and stylized graphics, though it appears incremental as it builds on existing multi-task learning approaches.
The paper tackles the challenge of no-reference video quality assessment for gaming videos by proposing MTL-VQA, a multi-task learning framework that uses full-reference metrics as supervisory signals to learn perceptually meaningful features without human labels. Experiments show it achieves performance competitive with state-of-the-art NR-VQA methods across various settings.
No-reference video quality assessment (NR-VQA) for gaming videos is challenging due to limited human-rated datasets and unique content characteristics including fast motion, stylized graphics, and compression artifacts. We present MTL-VQA, a multi-task learning framework that uses full-reference metrics as supervisory signals to learn perceptually meaningful features without human labels for pretraining. By jointly optimizing multiple full-reference (FR) objectives with adaptive task weighting, our approach learns shared representations that transfer effectively to NR-VQA. Experiments on gaming video datasets show MTL-VQA achieves performance competitive with state-of-the-art NR-VQA methods across both MOS-supervised and label-efficient/self-supervised settings.