CVIVNov 19, 2024

Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment

arXiv:2411.12791v25 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses a specific limitation in LMMs for image quality assessment, offering a cost-effective solution without retraining, but it is incremental as it builds on existing LMM capabilities.

The paper tackles the problem of large multimodal models (LMMs) performing poorly in image quality assessment due to a perception bias towards semantics over quality, and proposes a training-free debiasing framework that uses semantic-preserving distortions to align quality perception, resulting in consistent performance enhancements across various IQA datasets.

Despite the impressive performance of large multimodal models (LMMs) in high-level visual tasks, their capacity for image quality assessment (IQA) remains limited. One main reason is that LMMs are primarily trained for high-level tasks (e.g., image captioning), emphasizing unified image semantics extraction under varied quality. Such semantic-aware yet quality-insensitive perception bias inevitably leads to a heavy reliance on image semantics when those LMMs are forced for quality rating. In this paper, instead of retraining or tuning an LMM costly, we propose a training-free debiasing framework, in which the image quality prediction is rectified by mitigating the bias caused by image semantics. Specifically, we first explore several semantic-preserving distortions that can significantly degrade image quality while maintaining identifiable semantics. By applying these specific distortions to the query or test images, we ensure that the degraded images are recognized as poor quality while their semantics mainly remain. During quality inference, both a query image and its corresponding degraded version are fed to the LMM along with a prompt indicating that the query image quality should be inferred under the condition that the degraded one is deemed poor quality. This prior condition effectively aligns the LMM's quality perception, as all degraded images are consistently rated as poor quality, regardless of their semantic variance. Finally, the quality scores of the query image inferred under different prior conditions (degraded versions) are aggregated using a conditional probability model. Extensive experiments on various IQA datasets show that our debiasing framework could consistently enhance the LMM performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes