GenPlot: Increasing the Scale and Diversity of Chart Derendering Data
This work addresses a data bottleneck for researchers and practitioners in visual language processing, though it is incremental as it builds on existing chart-derendering methods.
The authors tackled the limited scale and diversity of training data for chart-derendering tasks by proposing GenPlot, a plot generator that can produce billions of synthetic plots, thereby expanding the dataset for improved model performance.
Vertical bars, horizontal bars, dot, scatter, and line plots provide a diverse set of visualizations to represent data. To understand these plots, one must be able to recognize textual components, locate data points in a plot, and process diverse visual contexts to extract information. In recent works such as Pix2Struct, Matcha, and Deplot, OCR-free chart-to-text translation has achieved state-of-the-art results on visual language tasks. These results outline the importance of chart-derendering as a pre-training objective, yet existing datasets provide a fixed set of training examples. In this paper, we propose GenPlot; a plot generator that can generate billions of additional plots for chart-derendering using synthetic data.