CVApr 15, 2024

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

arXiv:2404.09987v252 citationsh-index: 22MM
Originality Incremental advance
AI Analysis

This addresses the problem of unreliable chart information extraction for users in data analysis and visualization, though it appears incremental as it builds on existing LVLM architectures.

The paper tackles the challenge of chart parsing by proposing OneChart, an agent that introduces an auxiliary token and self-evaluation to enhance numerical reliability, achieving significant improvements in Average Precision over SOTA models and boosting downstream ChartQA accuracy by over 10%.

Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and so forth. Even advanced large vision-language models (LVLMs) with billions of parameters struggle to handle such tasks satisfactorily. To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information. Similar to popular LVLMs, OneChart incorporates an autoregressive main body. Uniquely, to enhance the reliability of the numerical parts of the output, we introduce an auxiliary token placed at the beginning of the total tokens along with an additional decoder. The numerically optimized (auxiliary) token allows subsequent tokens for chart parsing to capture enhanced numerical features through causal attention. Furthermore, with the aid of the auxiliary token, we have devised a self-evaluation mechanism that enables the model to gauge the reliability of its chart parsing results by providing confidence scores for the generated content. Compared to current state-of-the-art (SOTA) chart parsing models, e.g., DePlot, ChartVLM, ChartAst, OneChart significantly outperforms in Average Precision (AP) for chart structural extraction across multiple public benchmarks, despite enjoying only 0.2 billion parameters. Moreover, as a chart parsing agent, it also brings 10%+ accuracy gains for the popular LVLM (LLaVA-1.6) in the downstream ChartQA benchmark.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes