CVAILGMay 1, 2023

SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

arXiv:2305.00795v311 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the tedious data annotation problem in document segmentation for researchers and practitioners, though it is incremental as it builds on existing self-supervised techniques.

The paper tackles document layout analysis by addressing the scarcity of labeled data with a self-supervised, vision-based approach that generates pseudo-layouts for pre-training, achieving performance comparable to existing supervised methods.

Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to learn the document object representation and localization in a self-supervised framework before fine-tuning it with an object detection model. We show that our pipeline sets a new benchmark in this context and performs at par with the existing methods and the supervised counterparts, if not outperforms. The code is made publicly available at: https://github.com/MaitySubhajit/SelfDocSeg

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes