CVAug 30, 2023

DTrOCR: Decoder-only Transformer for Optical Character Recognition

arXiv:2308.15996v172 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses the problem of optical character recognition for researchers and practitioners in computer vision, offering a simpler and more effective approach.

The authors tackled text recognition by proposing DTrOCR, a decoder-only Transformer method that leverages pre-trained generative language models, and it outperformed state-of-the-art methods by a large margin across printed, handwritten, and scene text in English and Chinese.

Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features. In this study, we propose a simpler and more effective method for text recognition, known as the Decoder-only Transformer for Optical Character Recognition (DTrOCR). This method uses a decoder-only Transformer to take advantage of a generative language model that is pre-trained on a large corpus. We examined whether a generative language model that has been successful in natural language processing can also be effective for text recognition in computer vision. Our experiments demonstrated that DTrOCR outperforms current state-of-the-art methods by a large margin in the recognition of printed, handwritten, and scene text in both English and Chinese.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes