CVOct 28, 2021

DocScanner: Robust Document Image Rectification with Progressive Learning

arXiv:2110.14968v338 citations
Originality Incremental advance
AI Analysis

This addresses the issue of digitizing physical documents with smartphones for users needing accurate text extraction, though it appears incremental as it builds on existing rectification methods.

The paper tackles the problem of rectifying distorted document images from smartphones by introducing DocScanner, a framework with progressive learning and geometric regularization, which outperforms previous methods on OCR accuracy, image similarity, and distortion metrics by a considerable margin.

Compared with flatbed scanners, portable smartphones provide more convenience for physical document digitization. However, such digitized documents are often distorted due to uncontrolled physical deformations, camera positions, and illumination variations. To this end, we present DocScanner, a novel framework for document image rectification. Different from existing solutions, DocScanner addresses this issue by introducing a progressive learning mechanism. Specifically, DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture. The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency. To further improve the rectification quality, based on the geometric priori between the distorted and the rectified images, a geometric regularization is introduced during training to further improve the performance. Extensive experiments are conducted on the Doc3D dataset and the DocUNet Benchmark dataset, and the quantitative and qualitative evaluation results verify the effectiveness of DocScanner, which outperforms previous methods on OCR accuracy, image similarity, and our proposed distortion metric by a considerable margin. Furthermore, our DocScanner shows superior efficiency in runtime latency and model size.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes