ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering
This work addresses the gap between academic benchmarks and practical chart analysis needs for researchers and developers in multimodal AI, though it is incremental in improving evaluation methods.
The paper tackles the problem of evaluating chart question answering in complex real-world settings by introducing ChartMind, a comprehensive benchmark covering seven task categories, multilingual contexts, and diverse chart formats, and shows that their proposed ChartLLM framework significantly outperforms previous paradigms on this benchmark and three public ones.
Chart question answering (CQA) has become a critical multimodal task for evaluating the reasoning capabilities of vision-language models. While early approaches have shown promising performance by focusing on visual features or leveraging large-scale pre-training, most existing evaluations rely on rigid output formats and objective metrics, thus ignoring the complex, real-world demands of practical chart analysis. In this paper, we introduce ChartMind, a new benchmark designed for complex CQA tasks in real-world settings. ChartMind covers seven task categories, incorporates multilingual contexts, supports open-domain textual outputs, and accommodates diverse chart formats, bridging the gap between real-world applications and traditional academic benchmarks. Furthermore, we propose a context-aware yet model-agnostic framework, ChartLLM, that focuses on extracting key contextual elements, reducing noise, and enhancing the reasoning accuracy of multimodal large language models. Extensive evaluations on ChartMind and three representative public benchmarks with 14 mainstream multimodal models show our framework significantly outperforms the previous three common CQA paradigms: instruction-following, OCR-enhanced, and chain-of-thought, highlighting the importance of flexible chart understanding for real-world CQA. These findings suggest new directions for developing more robust chart reasoning in future research.