SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
This addresses a domain-specific problem for researchers and authors by providing a dataset to reduce the labor-intensive task of poster design, though it is incremental as it builds on existing computer vision models.
The authors tackled the lack of publicly available datasets for scientific poster generation by creating SciPostLayout, a dataset of 7,855 posters with manual layout annotations, and found that layout analysis and generation are more challenging with posters than with scientific papers.
Scientific posters are used to present the contributions of scientific papers effectively in a graphical format. However, creating a well-designed poster that efficiently summarizes the core of a paper is both labor-intensive and time-consuming. A system that can automatically generate well-designed posters from scientific papers would reduce the workload of authors and help readers understand the outline of the paper visually. Despite the demand for poster generation systems, only a limited research has been conduced due to the lack of publicly available datasets. Thus, in this study, we built the SciPostLayout dataset, which consists of 7,855 scientific posters and manual layout annotations for layout analysis and generation. SciPostLayout also contains 100 scientific papers paired with the posters. All of the posters and papers in our dataset are under the CC-BY license and are publicly available. As benchmark tests for the collected dataset, we conducted experiments for layout analysis and generation utilizing existing computer vision models and found that both layout analysis and generation of posters using SciPostLayout are more challenging than with scientific papers. We also conducted experiments on generating layouts from scientific papers to demonstrate the potential of utilizing LLM as a scientific poster generation system. The dataset is publicly available at https://huggingface.co/datasets/omron-sinicx/scipostlayout_v2. The code is also publicly available at https://github.com/omron-sinicx/scipostlayout.