CVAug 9, 2014

Automatic Removal of Marginal Annotations in Printed Text Document

arXiv:1408.2015v11 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of document restoration for archival or digitization purposes, but it is incremental as it builds on existing projection and connected component methods.

The paper tackles the problem of automatically recovering original printed text from documents with handwritten marginal annotations by detecting and removing these annotations without losing information, achieving 89.01% accuracy in annotation removal and 97.74% in text retrieval.

Recovering the original printed texts from a document with added handwritten annotations in the marginal area is one of the challenging problems, especially when the original document is not available. Therefore, this paper aims at salvaging automatically the original document from the annotated document by detecting and removing any handwritten annotations that appear in the marginal area of the document without any loss of information. Here a two stage algorithm is proposed, where in the first stage due to approximate marginal boundary detection with horizontal and vertical projection profiles, all of the marginal annotations along with some part of the original printed text that may appear very close to the marginal boundary are removed. Therefore as a second stage, using the connected components, a strategy is applied to bring back the printed text components cropped during the first stage. The proposed method is validated using a dataset of 50 documents having complex handwritten annotations, which gives an overall accuracy of 89.01% in removing the marginal annotations and 97.74% in case of retrieving the original printed text document.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes