CVAIMar 1

Seeing Beyond 8bits: Subjective and Objective Quality Assessment of HDR-UGC Videos

arXiv:2603.00938v1h-index: 23
Originality Highly original
AI Analysis

This addresses the need for accurate quality assessment of HDR-UGC videos on social platforms, representing a novel method for a known bottleneck.

The paper tackles the problem of perceptual video quality assessment for High Dynamic Range user-generated videos, which existing SDR-based models fail to handle due to unique HDR distortions, and introduces HDR-Q, a multimodal large language model that achieves state-of-the-art performance on their curated Beyond8Bits dataset and public benchmarks.

High Dynamic Range (HDR) user-generated (UGC) videos are rapidly proliferating across social platforms, yet most perceptual video quality assessment (VQA) systems remain tailored to Standard Dynamic Range (SDR). HDR has a higher bit depth, wide color gamut, and elevated luminance range, exposing distortions such as near-black crushing, highlight clipping, banding, and exposure flicker that amplify UGC artifacts and challenge SDR models. To catalyze progress, we curate Beyond8Bits, a large-scale subjective dataset of 44K videos from 6.5K sources with over 1.5M crowd ratings, spanning diverse scenes, capture conditions, and compression settings. We further introduce HDR-Q, the first Multimodal Large Language Model (MLLM) for HDR-UGC VQA. We propose (i) a novel HDR-aware vision encoder to produce HDR-sensitive embeddings, and (ii) HDR-Aware Policy Optimization (HAPO), an RL finetuning framework that anchors reasoning to HDR cues. HAPO augments GRPO via an HDR-SDR contrastive KL that encourages token reliance on HDR inputs and a Gaussian weighted regression reward for fine-grained MOS calibration. Across Beyond8Bits and public HDR-VQA benchmarks, HDR-Q delivers state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes