SciCap: Generating Captions for Scientific Figures
This work addresses the issue of poor figure captions in scientific articles, which can hinder understanding, by providing a dataset and baseline models for automated captioning, though it is incremental as it builds on existing captioning techniques.
The authors tackled the problem of low-quality figure captions in scientific papers by proposing an end-to-end neural framework to automatically generate informative captions, introducing SCICAP, a large-scale dataset with over two million figures from arXiv papers, and establishing baseline models for graph plots, which revealed both opportunities and challenges in caption generation.
Researchers use figures to communicate rich, complex information in scientific papers. The captions of these figures are critical to conveying effective messages. However, low-quality figure captions commonly occur in scientific articles and may decrease understanding. In this paper, we propose an end-to-end neural framework to automatically generate informative, high-quality captions for scientific figures. To this end, we introduce SCICAP, a large-scale figure-caption dataset based on computer science arXiv papers published between 2010 and 2020. After pre-processing - including figure-type classification, sub-figure identification, text normalization, and caption text selection - SCICAP contained more than two million figures extracted from over 290,000 papers. We then established baseline models that caption graph plots, the dominant (19.2%) figure type. The experimental results showed both opportunities and steep challenges of generating captions for scientific figures.