CVAug 30, 2023

DTrOCR: Decoder-only Transformer for Optical Character Recognition

arXiv:2308.15996v172 citationsh-index: 7

Originality Highly original

AI Analysis

This addresses the problem of optical character recognition for researchers and practitioners in computer vision, offering a simpler and more effective approach.

The authors tackled text recognition by proposing DTrOCR, a decoder-only Transformer method that leverages pre-trained generative language models, and it outperformed state-of-the-art methods by a large margin across printed, handwritten, and scene text in English and Chinese.

Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features. In this study, we propose a simpler and more effective method for text recognition, known as the Decoder-only Transformer for Optical Character Recognition (DTrOCR). This method uses a decoder-only Transformer to take advantage of a generative language model that is pre-trained on a large corpus. We examined whether a generative language model that has been successful in natural language processing can also be effective for text recognition in computer vision. Our experiments demonstrated that DTrOCR outperforms current state-of-the-art methods by a large margin in the recognition of printed, handwritten, and scene text in both English and Chinese.

View on arXiv PDF

Similar