CVSep 22, 2025

GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer

Md. Mahmudul Hasan, Ahmed Nesar Tahsin Choudhury, Mahmudul Hasan, Md. Mosaddek Khan

arXiv:2509.18081v16.21 citationsh-index: 1EMNLP

Originality Incremental advance

AI Analysis

This addresses the underdeveloped HTR for Bengali, a widely spoken language with complex script, though it appears incremental as it builds on existing transformer methods.

The authors tackled Bengali handwritten text recognition by developing GraDeT-HTR, a resource-efficient system using a grapheme-based tokenizer and decoder-only transformer, which achieved state-of-the-art performance on multiple benchmarks.

Despite Bengali being the sixth most spoken language in the world, handwritten text recognition (HTR) systems for Bengali remain severely underdeveloped. The complexity of Bengali script--featuring conjuncts, diacritics, and highly variable handwriting styles--combined with a scarcity of annotated datasets makes this task particularly challenging. We present GraDeT-HTR, a resource-efficient Bengali handwritten text recognition system based on a Grapheme-aware Decoder-only Transformer architecture. To address the unique challenges of Bengali script, we augment the performance of a decoder-only transformer by integrating a grapheme-based tokenizer and demonstrate that it significantly improves recognition accuracy compared to conventional subword tokenizers. Our model is pretrained on large-scale synthetic data and fine-tuned on real human-annotated samples, achieving state-of-the-art performance on multiple benchmark datasets.

View on arXiv PDF

Similar