CLAISep 18, 2025

A Multi-To-One Interview Paradigm for Efficient MLLM Evaluation

arXiv:2509.14886v1h-index: 30
Originality Incremental advance
AI Analysis

This provides a more efficient evaluation method for researchers benchmarking MLLMs, though it is incremental as it improves existing evaluation processes rather than introducing a new model or task.

The paper tackles the problem of inefficient and redundant evaluation of Multi-Modal Large Language Models (MLLMs) by proposing a multi-to-one interview paradigm, which achieves up to 17.6% higher correlation with full-coverage results while reducing the number of required questions.

The rapid progress of Multi-Modal Large Language Models (MLLMs) has spurred the creation of numerous benchmarks. However, conventional full-coverage Question-Answering evaluations suffer from high redundancy and low efficiency. Inspired by human interview processes, we propose a multi-to-one interview paradigm for efficient MLLM evaluation. Our framework consists of (i) a two-stage interview strategy with pre-interview and formal interview phases, (ii) dynamic adjustment of interviewer weights to ensure fairness, and (iii) an adaptive mechanism for question difficulty-level chosen. Experiments on different benchmarks show that the proposed paradigm achieves significantly higher correlation with full-coverage results than random sampling, with improvements of up to 17.6% in PLCC and 16.7% in SRCC, while reducing the number of required questions. These findings demonstrate that the proposed paradigm provides a reliable and efficient alternative for large-scale MLLM benchmarking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes