CVCLJun 11, 2019

Labeling, Cutting, Grouping: an Efficient Text Line Segmentation Method for Medieval Manuscripts

arXiv:1906.11894v233 citations
Originality Highly original
AI Analysis

This addresses the problem of accurately extracting text lines from noisy historical documents for researchers and archivists, representing a strong specific gain rather than a foundational advancement.

The paper tackles text-line extraction in complex medieval manuscripts by integrating deep-learning-based semantic segmentation with a novel extraction algorithm, achieving an 80.7% error reduction and 99.42% line IU performance on a challenging dataset.

This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes