CVJun 20, 2023

GenPlot: Increasing the Scale and Diversity of Chart Derendering Data

arXiv:2306.11699v11 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses a data bottleneck for researchers and practitioners in visual language processing, though it is incremental as it builds on existing chart-derendering methods.

The authors tackled the limited scale and diversity of training data for chart-derendering tasks by proposing GenPlot, a plot generator that can produce billions of synthetic plots, thereby expanding the dataset for improved model performance.

Vertical bars, horizontal bars, dot, scatter, and line plots provide a diverse set of visualizations to represent data. To understand these plots, one must be able to recognize textual components, locate data points in a plot, and process diverse visual contexts to extract information. In recent works such as Pix2Struct, Matcha, and Deplot, OCR-free chart-to-text translation has achieved state-of-the-art results on visual language tasks. These results outline the importance of chart-derendering as a pre-training objective, yet existing datasets provide a fixed set of training examples. In this paper, we propose GenPlot; a plot generator that can generate billions of additional plots for chart-derendering using synthetic data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes