CVDec 28, 2020

Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

Mélodie Boillet, Christopher Kermorvant, Thierry Paquet

arXiv:2012.14163v27.228 citations

Originality Incremental advance

AI Analysis

This work provides a more effective method for text line detection in historical documents, which is crucial for digitizing and analyzing archival materials.

This paper introduces Doc-UFCN, a fully convolutional network trained from scratch for text line detection in historical documents, treating it as a pixel-wise classification task. Doc-UFCN outperforms state-of-the-art methods on various datasets and demonstrates that pre-training on natural scene images is not necessary, while pre-training on multiple document datasets can further improve performance.

In this paper, we introduce a fully convolutional network for the document layout analysis task. While state-of-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods.

View on arXiv PDF

Similar