CLFeb 27, 2025

Protecting multimodal large language models against misleading visualizations

Jonathan Tonglet, Tinne Tuytelaars, Marie-Francine Moens, Iryna Gurevych

arXiv:2502.20503v412.07 citationsh-index: 76Has Code

Originality Incremental advance

AI Analysis

This addresses the reliability problem for MLLMs in chart understanding, crucial for preventing disinformation in data-driven communication, though it is incremental as it builds on existing MLLM research.

The paper tackles the vulnerability of multimodal large language models (MLLMs) to misleading visualizations, showing that their question-answering accuracy drops to random baseline levels, and introduces inference-time methods that improve performance by up to 19.6 percentage points without harming accuracy on non-misleading charts.

Visualizations play a pivotal role in daily communication in an increasingly datadriven world. Research on multimodal large language models (MLLMs) for automated chart understanding has accelerated massively, with steady improvements on standard benchmarks. However, for MLLMs to be reliable, they must be robust to misleading visualizations, i.e., charts that distort the underlying data, leading readers to draw inaccurate conclusions that may support disinformation. Here, we uncover an important vulnerability: MLLM questionanswering (QA) accuracy on misleading visualizations drops on average to the level of the random baseline. To address this, we introduce the first inference-time methods to improve QA performance on misleading visualizations, without compromising accuracy on non-misleading ones. We find that two methods, table-based QA and redrawing the visualization, are effective, with improvements of up to 19.6 percentage points. We make our code and data available.

View on arXiv PDF Code

Similar