CVMMJul 23, 2024

QPT V2: Masked Image Modeling Advances Visual Scoring

arXiv:2407.16541v16 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

This work addresses the scarcity of labeled data for visual quality and aesthetics assessment, offering a unified solution that improves generalization, though it is incremental as it adapts existing MIM techniques to a new domain.

The paper tackled the problem of limited labeled data and poor generalization in visual quality and aesthetics assessment by proposing QPT V2, a masked image modeling-based pretraining framework, which achieved superior performance on 11 benchmarks compared to state-of-the-art methods.

Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment. To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetic information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms. Code and models will be released at \url{https://github.com/KeiChiTse/QPT-V2}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes