ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
This provides a dataset to enhance multimodal AI for infographic processing, but it is incremental as it focuses on data creation rather than novel methods.
The authors tackled the challenge of large vision-language models struggling with infographic charts by introducing ChartGalaxy, a million-scale synthetic dataset that improved understanding and generation tasks, such as fine-tuning for better performance and enabling code-based creation.
Infographic charts are a powerful medium for communicating abstract data by combining visual elements (e.g., charts, images) with textual information. However, their visual and structural richness poses challenges for large vision-language models (LVLMs), which are typically trained on plain charts. To bridge this gap, we introduce ChartGalaxy, a million-scale dataset designed to advance the understanding and generation of infographic charts. The dataset is constructed through an inductive process that identifies 75 chart types, 440 chart variations, and 68 layout templates from real infographic charts and uses them to create synthetic ones programmatically. We showcase the utility of this dataset through: 1) improving infographic chart understanding via fine-tuning, 2) benchmarking code generation for infographic charts, and 3) enabling example-based infographic chart generation. By capturing the visual and structural complexity of real design, ChartGalaxy provides a useful resource for enhancing multimodal reasoning and generation in LVLMs.