CVAIApr 15, 2024

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

arXiv:2404.09530v25 citationsh-index: 44MMAsia
Originality Synthesis-oriented
AI Analysis

This addresses the need for robust and adaptable document layout detection models across diverse formats, though it is incremental as it builds on existing domain adaptation methods.

The paper tackles the problem of limited layout diversity in document layout detection datasets by introducing RanLayNet, a synthetic dataset with automatic labels, and shows that models trained on it achieve improved performance, such as 0.398 and 0.588 mAP95 scores for the TABLE class in scientific documents.

Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes