CVJan 10, 2024

Watermark Text Pattern Spotting in Document Images

arXiv:2401.05167v23 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of resources for watermark text spotting in documents, which is important for historical and forensic analysis, but it is incremental as it builds on existing text spotting methods.

The authors tackled the problem of detecting and reading watermark text in document images by introducing a new benchmark dataset (K-Watermark) with 65,447 samples and an end-to-end solution (Wextract), which surpassed baselines by 5 AP points in detection and 4 points in character accuracy.

Watermark text spotting in document images can offer access to an often unexplored source of information, providing crucial evidence about a record's scope, audience and sometimes even authenticity. Stemming from the problem of text spotting, detecting and understanding watermarks in documents inherits the same hardships - in the wild, writing can come in various fonts, sizes and forms, making generic recognition a very difficult problem. To address the lack of resources in this field and propel further research, we propose a novel benchmark (K-Watermark) containing 65,447 data samples generated using Wrender, a watermark text patterns rendering procedure. A validity study using humans raters yields an authenticity score of 0.51 against pre-generated watermarked documents. To prove the usefulness of the dataset and rendering technique, we developed an end-to-end solution (Wextract) for detecting the bounding box instances of watermark text, while predicting the depicted text. To deal with this specific task, we introduce a variance minimization loss and a hierarchical self-attention mechanism. To the best of our knowledge, we are the first to propose an evaluation benchmark and a complete solution for retrieving watermarks from documents surpassing baselines by 5 AP points in detection and 4 points in character accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes