CLApr 30, 2021

Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts

arXiv:2104.14925v1726 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for efficient manual database curation and biomedical expression OCR, though it is incremental as it builds on standard components.

The paper tackles the problem of automatically aligning printed documents with their electronic full-text versions at the word level, achieving an F-score of 85.01 in a basic setup and up to 86.63 with enhancements.

We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes