CVAICLAug 28, 2024

A Survey on Evaluation of Multimodal Large Language Models

arXiv:2408.15769v156 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

It addresses the need for standardized evaluation in the rapidly evolving field of MLLMs, which is crucial for advancing towards artificial general intelligence, but is incremental as it synthesizes existing methods rather than introducing new ones.

This paper systematically reviews evaluation methods for Multimodal Large Language Models (MLLMs), categorizing tasks, benchmarks, and metrics to assess capabilities like perception and reasoning, aiming to guide researchers in developing more capable and reliable MLLMs.

Multimodal Large Language Models (MLLMs) mimic human perception and reasoning system by integrating powerful Large Language Models (LLMs) with various modality encoders (e.g., vision, audio), positioning LLMs as the "brain" and various modality encoders as sensory organs. This framework endows MLLMs with human-like capabilities, and suggests a potential pathway towards achieving artificial general intelligence (AGI). With the emergence of all-round MLLMs like GPT-4V and Gemini, a multitude of evaluation methods have been developed to assess their capabilities across different dimensions. This paper presents a systematic and comprehensive review of MLLM evaluation methods, covering the following key aspects: (1) the background of MLLMs and their evaluation; (2) "what to evaluate" that reviews and categorizes existing MLLM evaluation tasks based on the capabilities assessed, including general multimodal recognition, perception, reasoning and trustworthiness, and domain-specific applications such as socioeconomic, natural sciences and engineering, medical usage, AI agent, remote sensing, video and audio processing, 3D point cloud analysis, and others; (3) "where to evaluate" that summarizes MLLM evaluation benchmarks into general and specific benchmarks; (4) "how to evaluate" that reviews and illustrates MLLM evaluation steps and metrics; Our overarching goal is to provide valuable insights for researchers in the field of MLLM evaluation, thereby facilitating the development of more capable and reliable MLLMs. We emphasize that evaluation should be regarded as a critical discipline, essential for advancing the field of MLLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes