Chart Question Answering from Real-World Analytical Narratives
This addresses the need for more realistic chart question answering benchmarks for researchers and practitioners, though it is incremental as it builds on existing CQA work with a new dataset.
The authors tackled the problem of chart question answering by creating a new dataset from real-world visualization notebooks, featuring multi-view charts and natural language questions grounded in analytical narratives. Benchmarking showed a significant performance gap, with GPT-4.1 achieving only 69.3% accuracy, highlighting the challenges of this authentic setting.
We present a new dataset for chart question answering (CQA) constructed from visualization notebooks. The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives. Unlike prior benchmarks, our data reflects ecologically valid reasoning workflows. Benchmarking state-of-the-art multimodal large language models reveals a significant performance gap, with GPT-4.1 achieving an accuracy of 69.3%, underscoring the challenges posed by this more authentic CQA setting.