Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework
This work addresses the annotation bottleneck for IQA tasks, making lightweight assessment practical under limited budgets, though it is incremental in leveraging existing MLLM capabilities.
The paper tackles the problem of reducing human annotation costs for image quality assessment (IQA) by proposing LEAF, a framework that distills perceptual priors from a multimodal large language model into a lightweight regressor, achieving strong MOS-aligned correlations with minimal supervision.
Recent multimodal large language models (MLLMs) have demonstrated strong capabilities in image quality assessment (IQA) tasks. However, adapting such large-scale models is computationally expensive and still relies on substantial Mean Opinion Score (MOS) annotations. We argue that for MLLM-based IQA, the core bottleneck lies not in the quality perception capacity of MLLMs, but in MOS scale calibration. Therefore, we propose LEAF, a Label-Efficient Image Quality Assessment Framework that distills perceptual quality priors from an MLLM teacher into a lightweight student regressor, enabling MOS calibration with minimal human supervision. Specifically, the teacher conducts dense supervision through point-wise judgments and pair-wise preferences, with an estimate of decision reliability. Guided by these signals, the student learns the teacher's quality perception patterns through joint distillation and is calibrated on a small MOS subset to align with human annotations. Experiments on both user-generated and AI-generated IQA benchmarks demonstrate that our method significantly reduces the need for human annotations while maintaining strong MOS-aligned correlations, making lightweight IQA practical under limited annotation budgets.