CLAICVHCLGJul 15, 2024

Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness

arXiv:2407.11229v231 citationsh-index: 15
AI Analysis

This study addresses the robustness and consistency of VLMs for chart understanding, which is crucial for visual language applications, but it is incremental as it focuses on evaluation rather than proposing new methods.

This paper evaluated state-of-the-art Visual Language Models (VLMs) on chart question answering, revealing significant performance variations based on question and chart types, highlighting both strengths and weaknesses.

Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models' ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes