CVApr 27, 2018

dhSegment: A generic deep-learning approach for document segmentation

arXiv:1804.10371v2190 citationsHas Code
Originality Incremental advance
AI Analysis

This provides a flexible, open-source solution for handling variability in historical document series, though it is incremental as it builds on existing CNN methods with task-specific post-processing.

The paper tackles the problem of diverse historical document processing tasks by proposing a generic deep-learning approach using a single CNN architecture for pixel-wise prediction, achieving competitive results across multiple tasks like page extraction and layout analysis.

In recent years there have been multiple successful attempts tackling document processing problems separately by designing task specific hand-tuned strategies. We argue that the diversity of historical document processing tasks prohibits to solve them one at a time and shows a need for designing generic approaches in order to handle the variability of historical series. In this paper, we address multiple tasks simultaneously such as page extraction, baseline extraction, layout analysis or multiple typologies of illustrations and photograph extraction. We propose an open-source implementation of a CNN-based pixel-wise predictor coupled with task dependent post-processing blocks. We show that a single CNN-architecture can be used across tasks with competitive results. Moreover most of the task-specific post-precessing steps can be decomposed in a small number of simple and standard reusable operations, adding to the flexibility of our approach.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes