CVFeb 1, 2021

RectiNet-v2: A stacked network architecture for document image dewarping

arXiv:2102.01120v12 citations
Originality Incremental advance
AI Analysis

This addresses the need for distortion-free document images for recognition algorithms, particularly from mobile cameras, but is incremental as it builds on existing methods.

The authors tackled the problem of dewarping document images to remove perspective distortions and folds, proposing an end-to-end CNN architecture that achieves results comparable to state-of-the-art methods on the DocUNet benchmark.

With the advent of mobile and hand-held cameras, document images have found their way into almost every domain. Dewarping of these images for the removal of perspective distortions and folds is essential so that they can be understood by document recognition algorithms. For this, we propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input. We train this model on warped document images simulated synthetically to compensate for lack of enough natural data. Our method is novel in the use of a bifurcated decoder with shared weights to prevent intermingling of grid coordinates, in the use of residual networks in the U-Net skip connections to allow flow of data from different receptive fields in the model, and in the use of a gated network to help the model focus on structure and line level detail of the document image. We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes