CLCVAug 9, 2024

MSG-Chart: Multimodal Scene Graph for ChartQA

arXiv:2408.04852v14 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses the problem of interpreting complex charts for automated question answering, though it appears incremental as it builds on existing vision transformers with a new graph module.

The authors tackled the challenge of automatic Chart Question Answering by designing a joint multimodal scene graph to represent relationships between chart elements and patterns, which improved performance on ChartQA and OpenCQA benchmarks.

Automatic Chart Question Answering (ChartQA) is challenging due to the complex distribution of chart elements with patterns of the underlying data not explicitly displayed in charts. To address this challenge, we design a joint multimodal scene graph for charts to explicitly represent the relationships between chart elements and their patterns. Our proposed multimodal scene graph includes a visual graph and a textual graph to jointly capture the structural and semantical knowledge from the chart. This graph module can be easily integrated with different vision transformers as inductive bias. Our experiments demonstrate that incorporating the proposed graph module enhances the understanding of charts' elements' structure and semantics, thereby improving performance on publicly available benchmarks, ChartQA and OpenCQA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes