CVMar 20

One Model, Two Minds: Task-Conditioned Reasoning for Unified Image Quality and Aesthetic Assessment

arXiv:2603.1977986.92 citationsh-index: 4Has Code
Predicted impact top 20% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of efficiently handling multiple perceptual scoring tasks for computer vision applications, though it is incremental in improving existing unified approaches.

The paper tackles the problem of unifying Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) in a single model by identifying mismatches in reasoning and optimization, and proposes TATAR, a task-conditioned framework that outperforms prior unified baselines across eight benchmarks.

Unifying Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) in a single multimodal large language model is appealing, yet existing methods adopt a task-agnostic recipe that applies the same reasoning strategy and reward to both tasks. We show this is fundamentally misaligned: IQA relies on low-level, objective perceptual cues and benefits from concise distortion-focused reasoning, whereas IAA requires deliberative semantic judgment and is poorly served by point-wise score regression. We identify these as a reasoning mismatch and an optimization mismatch, and provide empirical evidence for both through controlled probes. Motivated by these findings, we propose TATAR (Task-Aware Thinking with Asymmetric Rewards), a unified framework that shares the visual-language backbone while conditioning post-training on each task's nature. TATAR combines three components: fast--slow task-specific reasoning construction that pairs IQA with concise perceptual rationales and IAA with deliberative aesthetic narratives; two-stage SFT+GRPO learning that establishes task-aware behavioral priors before reward-driven refinement; and asymmetric rewards that apply Gaussian score shaping for IQA and Thurstone-style completion ranking for IAA. Extensive experiments across eight benchmarks demonstrate that TATAR consistently outperforms prior unified baselines on both tasks under in-domain and cross-domain settings, remains competitive with task-specific specialized models, and yields more stable training dynamics for aesthetic assessment. Our results establish task-conditioned post-training as a principled paradigm for unified perceptual scoring. Our code is publicly available at https://github.com/yinwen2019/TATAR.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes