CVJan 25, 2023

Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

arXiv:2301.10593v112 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This work addresses the inference speed bottleneck for end-to-end handwritten document recognition, offering a practical improvement for applications requiring real-time processing.

The paper tackles the slow inference speed of autoregressive document recognition models by introducing Faster DAN, a two-step method that predicts first characters of lines and then completes lines in parallel, achieving competitive accuracy while being at least 4 times faster on datasets like RIMES 2009.

Recent advances in handwritten text recognition enabled to recognize whole documents in an end-to-end way: the Document Attention Network (DAN) recognizes the characters one after the other through an attention-based prediction process until reaching the end of the document. However, this autoregressive process leads to inference that cannot benefit from any parallelization optimization. In this paper, we propose Faster DAN, a two-step strategy to speed up the recognition process at prediction time: the model predicts the first character of each text line in the document, and then completes all the text lines in parallel through multi-target queries and a specific document positional encoding scheme. Faster DAN reaches competitive results compared to standard DAN, while being at least 4 times faster on whole single-page and double-page images of the RIMES 2009, READ 2016 and MAURDOR datasets. Source code and trained model weights are available at https://github.com/FactoDeepLearning/FasterDAN.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes