MDS-VQA: Model-Informed Data Selection for Video Quality Assessment
This addresses the problem of inefficient dataset curation for video quality assessment models, offering a data-centric approach to enhance model adaptation and generalization, though it is incremental as it builds on existing VQA methods.
The paper tackled the disconnect between model design and dataset curation in video quality assessment by introducing MDS-VQA, a model-informed data selection mechanism that curates diverse and challenging unlabeled videos for active fine-tuning, resulting in improved mean SRCC from 0.651 to 0.722 with only a 5% selected subset.
Learning-based video quality assessment (VQA) has advanced rapidly, yet progress is increasingly constrained by a disconnect between model design and dataset curation. Model-centric approaches often iterate on fixed benchmarks, while data-centric efforts collect new human labels without systematically targeting the weaknesses of existing VQA models. Here, we describe MDS-VQA, a model-informed data selection mechanism for curating unlabeled videos that are both difficult for the base VQA model and diverse in content. Difficulty is estimated by a failure predictor trained with a ranking objective, and diversity is measured using deep semantic video features, with a greedy procedure balancing the two under a constrained labeling budget. Experiments across multiple VQA datasets and models demonstrate that MDS-VQA identifies diverse, challenging samples that are particularly informative for active fine-tuning. With only a 5% selected subset per target domain, the fine-tuned model improves mean SRCC from 0.651 to 0.722 and achieves the top gMAD rank, indicating strong adaptation and generalization.