CVDec 11, 2017

Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing

arXiv:1712.03738v18 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in automated document image processing by enabling hyperparameter optimization without ground truth, though it is incremental as it builds on existing metrics and datasets.

The paper tackles the limitation of document image quality metrics requiring ground truth images by proposing surrogate models to predict metric values on unseen documents, achieving empirical evaluation on DIBCO and H-DIBCO datasets.

Computation of document image quality metrics often depends upon the availability of a ground truth image corresponding to the document. This limits the applicability of quality metrics in applications such as hyperparameter optimization of image processing algorithms that operate on-the-fly on unseen documents. This work proposes the use of surrogate models to learn the behavior of a given document quality metric on existing datasets where ground truth images are available. The trained surrogate model can later be used to predict the metric value on previously unseen document images without requiring access to ground truth images. The surrogate model is empirically evaluated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes