CVAug 29, 2023

Is it an i or an l: Test-time Adaptation of Text Line Recognition Models

Debapriya Tula, Sujoy Paul, Gagan Madan, Peter Garst, Reeve Ingle, Gaurav Aggarwal

arXiv:2308.15037v13.92 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses the challenge of inconsistent handwriting and image corruptions for document digitization, though it is incremental as it builds on existing self-training methods.

The paper tackles the problem of text line recognition errors in handwritten documents by adapting models at test time using only a single unlabeled image, achieving up to an 8% absolute improvement in character error rate.

Recognizing text lines from images is a challenging problem, especially for handwritten documents due to large variations in writing styles. While text line recognition models are generally trained on large corpora of real and synthetic data, such models can still make frequent mistakes if the handwriting is inscrutable or the image acquisition process adds corruptions, such as noise, blur, compression, etc. Writing style is generally quite consistent for an individual, which can be leveraged to correct mistakes made by such models. Motivated by this, we introduce the problem of adapting text line recognition models during test time. We focus on a challenging and realistic setting where, given only a single test image consisting of multiple text lines, the task is to adapt the model such that it performs better on the image, without any labels. We propose an iterative self-training approach that uses feedback from the language model to update the optical model, with confident self-labels in each iteration. The confidence measure is based on an augmentation mechanism that evaluates the divergence of the prediction of the model in a local region. We perform rigorous evaluation of our method on several benchmark datasets as well as their corrupted versions. Experimental results on multiple datasets spanning multiple scripts show that the proposed adaptation method offers an absolute improvement of up to 8% in character error rate with just a few iterations of self-training at test time.

View on arXiv PDF

Similar