CVHCMay 22, 2024

AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks

arXiv:2405.13580v16 citationsh-index: 22Has CodeICDAR
Originality Incremental advance
AI Analysis

This work addresses the challenge of accessible chart interpretation for blind people, though it is incremental as it builds on existing VLM-based methods.

The paper tackles the problem of generating high-quality chart summaries for blind and visually impaired individuals by introducing the AltChart dataset of 10,000 chart images with rich annotations and a new pretraining method for Vision-Language Models, achieving a performance gain of approximately 2.5%.

Chart summarization is a crucial task for blind and visually impaired individuals as it is their primary means of accessing and interpreting graphical data. Crafting high-quality descriptions is challenging because it requires precise communication of essential details within the chart without vision perception. Many chart analysis methods, however, produce brief, unstructured responses that may contain significant hallucinations, affecting their reliability for blind people. To address these challenges, this work presents three key contributions: (1) We introduce the AltChart dataset, comprising 10,000 real chart images, each paired with a comprehensive summary that features long-context, and semantically rich annotations. (2) We propose a new method for pretraining Vision-Language Models (VLMs) to learn fine-grained chart representations through training with multiple pretext tasks, yielding a performance gain with ${\sim}2.5\%$. (3) We conduct extensive evaluations of four leading chart summarization models, analyzing how accessible their descriptions are. Our dataset and codes are publicly available on our project page: https://github.com/moured/AltChart.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes