CLMar 12

Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge

Junjie Wu, Xuan Kan, Zihao He, Shunwen Tan, Bo Pan, Kaitai Zhang

arXiv:2603.11665v16.8h-index: 9

Predicted impact top 60% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the need for more reliable evaluation systems in multimodal AI by enhancing generalization capabilities, though it appears incremental as it builds on existing MLLM-as-a-Judge approaches.

The paper tackles the problem of multimodal LLM-as-a-Judge models struggling to generalize across diverse contexts by proposing MT-RL-Judge, a multi-task reinforcement learning framework that jointly optimizes the judge model across multiple tasks, resulting in improved judgment consistency and correlation with human preferences.

Multimodal Large Language Models (MLLMs) have been widely adopted as MLLM-as-a-Judges due to their strong alignment with human judgment across various visual tasks. However, most existing judge models are optimized for single-task scenarios and struggle to generalize to diverse contexts, which is a critical requirement for reliable evaluation. To address this limitation, we propose Multi-Task Reinforcement Learning for MLLM-as-a-Judge (MT-RL-Judge), a framework that jointly optimizes the judge model across multiple tasks, leveraging the generalization capabilities of RL. Experimental results against several strong baselines demonstrate that MT-RL-Judge outperforms strong baselines in both judgment consistency and correlation with human preferences. Furthermore, our approach exhibits robust generalization on out-of-distribution tasks, further validating its effectiveness.

View on arXiv PDF

Similar