CLAIMar 23, 2025

Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering

arXiv:2503.18172v515 citationsh-index: 9EMNLP
Originality Incremental advance
AI Analysis

This work addresses the risk of misleading charts for public understanding and AI safety, but it is incremental as it builds on existing MLLM capabilities for chart comprehension.

The paper tackled the problem of misleading visualizations by introducing the Misleading ChartQA benchmark, a large-scale dataset with 3,026 examples to evaluate multimodal large language models (MLLMs), and found that a novel region-aware reasoning pipeline enhanced model accuracy, though specific performance numbers are not provided in the abstract.

Misleading visualizations, which manipulate chart representations to support specific claims, can distort perception and lead to incorrect conclusions. Despite decades of research, they remain a widespread issue, posing risks to public understanding and raising safety concerns for AI systems involved in data-driven communication. While recent multimodal large language models (MLLMs) show strong chart comprehension abilities, their capacity to detect and interpret misleading charts remains unexplored. We introduce Misleading ChartQA benchmark, a large-scale multimodal dataset designed to evaluate MLLMs on misleading chart reasoning. It contains 3,026 curated examples spanning 21 misleader types and 10 chart types, each with standardized chart code, CSV data, multiple-choice questions, and labeled explanations, validated through iterative MLLM checks and expert human review. We benchmark 24 state-of-the-art MLLMs, analyze their performance across misleader types and chart formats, and propose a novel region-aware reasoning pipeline that enhances model accuracy. Our work lays the foundation for developing MLLMs that are robust, trustworthy, and aligned with the demands of responsible visual communication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes