CVSep 3, 2025

Parameter-Efficient Adaptation of mPLUG-Owl2 via Pixel-Level Visual Prompts for NR-IQA

arXiv:2509.03494v2h-index: 23
AI Analysis

This work addresses efficient adaptation of multimodal large language models for low-level vision tasks like NR-IQA, offering a novel approach but is incremental in the broader context of visual prompting.

The paper tackles the problem of No-Reference Image Quality Assessment (NR-IQA) by proposing a parameter-efficient adaptation method using pixel-level visual prompts for mPLUG-Owl2, achieving competitive performance with only 600K parameters trained and a Spearman Rank Correlation Coefficient of 0.93 on KADID-10k.

In this paper, we propose a novel parameter-efficient adaptation method for No- Reference Image Quality Assessment (NR-IQA) using visual prompts optimized in pixel-space. Unlike full fine-tuning of Multimodal Large Language Models (MLLMs), our approach trains only 600K parameters at most (< 0.01% of the base model), while keeping the underlying model fully frozen. During inference, these visual prompts are combined with images via addition and processed by mPLUG-Owl2 with the textual query "Rate the technical quality of the image." Evaluations across distortion types (synthetic, realistic, AI-generated) on KADID- 10k, KonIQ-10k, and AGIQA-3k demonstrate competitive performance against full finetuned methods and specialized NR-IQA models, achieving 0.93 SRCC on KADID-10k. To our knowledge, this is the first work to leverage pixel-space visual prompts for NR-IQA, enabling efficient MLLM adaptation for low-level vision tasks. The source code is publicly available at https: // github. com/ yahya-ben/ mplug2-vp-for-nriqa.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes