CL AI CV HCFeb 20, 2025

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback

Henry Hengyuan Zhao, Wenqi Pei, Yifei Tao, Haiyang Mei, Mike Zheng Shou

arXiv:2502.15027v3h-index: 7Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for better interactive AI assistants, but it is incremental as it focuses on benchmarking rather than solving the underlying issue.

The paper tackles the problem of assessing interactive intelligence in Large Multimodal Models (LMMs) with human users, finding that even state-of-the-art models like OpenAI-o1 struggle, achieving an average score of less than 50% in refining responses based on feedback.

Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users, which is vital for developing general-purpose AI assistants. We design InterFeedback, an interactive framework, which can be applied to any LMM and dataset to assess this ability autonomously. On top of this, we introduce InterFeedback-Bench which evaluates interactive intelligence using two representative datasets, MMMU-Pro and MathVerse, to test 10 different open-source LMMs. Additionally, we present InterFeedback-Human, a newly collected dataset of 120 cases designed for manually testing interactive performance in leading models such as OpenAI-o1 and Claude-Sonnet-4. Our evaluation results indicate that even the state-of-the-art LMM, OpenAI-o1, struggles to refine its responses based on human feedback, achieving an average score of less than 50%. Our findings point to the need for methods that can enhance LMMs' capabilities to interpret and benefit from feedback.

View on arXiv PDF

Similar