CVJan 22, 2025

DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning

arXiv:2501.12898v14 citationsh-index: 3WACV
Originality Incremental advance
AI Analysis

It addresses a practical problem in document recognition for applications with minimal annotated data, though it is incremental as it builds on existing test-time training and meta-learning techniques.

The paper tackles the challenge of handwritten document recognition with complex backgrounds and limited annotated data by introducing the DocTTT framework, which uses test-time training and meta-auxiliary learning, achieving significant performance improvements over state-of-the-art methods on benchmark datasets.

Despite recent significant advancements in Handwritten Document Recognition (HDR), the efficient and accurate recognition of text against complex backgrounds, diverse handwriting styles, and varying document layouts remains a practical challenge. Moreover, this issue is seldom addressed in academic research, particularly in scenarios with minimal annotated data available. In this paper, we introduce the DocTTT framework to address these challenges. The key innovation of our approach is that it uses test-time training to adapt the model to each specific input during testing. We propose a novel Meta-Auxiliary learning approach that combines Meta-learning and self-supervised Masked Autoencoder~(MAE). During testing, we adapt the visual representation parameters using a self-supervised MAE loss. During training, we learn the model parameters using a meta-learning framework, so that the model parameters are learned to adapt to a new input effectively. Experimental results show that our proposed method significantly outperforms existing state-of-the-art approaches on benchmark datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes