CVApr 29, 2022

Towards Automatic Parsing of Structured Visual Content through the Use of Synthetic Data

arXiv:2204.14136v12 citationsh-index: 70
Originality Synthesis-oriented
AI Analysis

This addresses the issue of automated knowledge aggregation and accessibility for visually impaired people, but is incremental as it builds on existing supervised learning methods with a new dataset.

The paper tackles the problem of making structured visual content (SVC) like graphs machine-readable by proposing a synthetic dataset of 12,000 images with annotations, and demonstrates a model that extracts graph representations from SVC images, showing transferability to real data.

Structured Visual Content (SVC) such as graphs, flow charts, or the like are used by authors to illustrate various concepts. While such depictions allow the average reader to better understand the contents, images containing SVCs are typically not machine-readable. This, in turn, not only hinders automated knowledge aggregation, but also the perception of displayed in-formation for visually impaired people. In this work, we propose a synthetic dataset, containing SVCs in the form of images as well as ground truths. We show the usage of this dataset by an application that automatically extracts a graph representation from an SVC image. This is done by training a model via common supervised learning methods. As there currently exist no large-scale public datasets for the detailed analysis of SVC, we propose the Synthetic SVC (SSVC) dataset comprising 12,000 images with respective bounding box annotations and detailed graph representations. Our dataset enables the development of strong models for the interpretation of SVCs while skipping the time-consuming dense data annotation. We evaluate our model on both synthetic and manually annotated data and show the transferability of synthetic to real via various metrics, given the presented application. Here, we evaluate that this proof of concept is possible to some extend and lay down a solid baseline for this task. We discuss the limitations of our approach for further improvements. Our utilized metrics can be used as a tool for future comparisons in this domain. To enable further research on this task, the dataset is publicly available at https://bit.ly/3jN1pJJ

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes