AIMay 7

Towards Annotation-Free Validation of MLLMs: A Vision-Language Logical Consistency Metric

Ying Gu, Mei Chee Leong, Hui Li Tan, Shangbo Mao, Liyuan Li, Nancy Chen

arXiv:2605.0620157.9Has Code

AI Analysis

For researchers and practitioners evaluating MLLMs, this work provides an annotation-free validation metric that captures logical consistency, addressing limitations of accuracy-based evaluation.

The paper proposes a novel metric, VL-LCM, to evaluate vision-language logical consistency of MLLMs without requiring ground-truth annotations, and shows that current MLLMs lag in logical consistency compared to accuracy.

Dominant accuracy evaluation might reward unwarranted guessing of Large Language Models, and it might not be applicable to novel tasks for model validation without ground-truth (gt) annotation. Based on basic logic principle, we propose a novel framework to evaluate the vision-language logical consistency of MLLMs on both sufficient and necessary cause-effect relations. We define Vision-Language Logical Consistency Metric (VL-LCM) on traditional MC-VQA tests, and recent NaturalBench tests without the need for gt annotation. Through systematic experiments on representative VL benchmark MMMU and recent VL challenges like NaturalBench, we evaluated 11 recent open-source MLLMs from 4 frontier families. Our findings reveal that, despite significant progress of recent MLLMs on accuracy, logical consistency lags behind significantly. Extensive evaluations on the correlations of VL-LCM with metrics on gt, the reliability of LCM, and the relation of VL-LCM with response distribution justify the validity and applicability of VL-LCM even without gt annotation. Our findings suggest that, beyond accuracy, logical consistency could be employed for both accuracy and reliability. VL-LCM can also be employed for MLLM selection, validation, and reliable answer justification in novel tasks without gt annotation.

View on arXiv PDF

Similar