AIMMSep 21, 2024

A Survey on Multimodal Benchmarks: In the Era of Large AI Models

arXiv:2409.18142v130 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

It addresses a gap in the analysis of benchmarks for MLLMs, which is crucial for researchers in AI and multimodal systems, though it is incremental as a survey rather than a novel method.

This survey systematically reviews 211 benchmarks for evaluating Multimodal Large Language Models across understanding, reasoning, generation, and application domains, analyzing task designs, metrics, and dataset constructions to provide a comprehensive overview of benchmarking practices.

The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial advancements in artificial intelligence, significantly enhancing the capability to understand and generate multimodal content. While prior studies have largely concentrated on model architectures and training methodologies, a thorough analysis of the benchmarks used for evaluating these models remains underexplored. This survey addresses this gap by systematically reviewing 211 benchmarks that assess MLLMs across four core domains: understanding, reasoning, generation, and application. We provide a detailed analysis of task designs, evaluation metrics, and dataset constructions, across diverse modalities. We hope that this survey will contribute to the ongoing advancement of MLLM research by offering a comprehensive overview of benchmarking practices and identifying promising directions for future work. An associated GitHub repository collecting the latest papers is available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes