CVMar 19, 2025

Bridging the Gap: Fusing CNNs and Transformers to Decode the Elegance of Handwritten Arabic Script

arXiv:2503.15023v1
Originality Incremental advance
AI Analysis

This work advances OCR systems for Arabic handwriting, offering a scalable solution for real-world applications, though it is incremental as it builds on existing CNNs and Transformers.

The paper tackled handwritten Arabic script recognition by proposing a hybrid approach combining CNNs and Transformers, achieving 96.38% accuracy for letter classification and 97.22% for positional classification on the IFN/ENIT dataset.

Handwritten Arabic script recognition is a challenging task due to the script's dynamic letter forms and contextual variations. This paper proposes a hybrid approach combining convolutional neural networks (CNNs) and Transformer-based architectures to address these complexities. We evaluated custom and fine-tuned models, including EfficientNet-B7 and Vision Transformer (ViT-B16), and introduced an ensemble model that leverages confidence-based fusion to integrate their strengths. Our ensemble achieves remarkable performance on the IFN/ENIT dataset, with 96.38% accuracy for letter classification and 97.22% for positional classification. The results highlight the complementary nature of CNNs and Transformers, demonstrating their combined potential for robust Arabic handwriting recognition. This work advances OCR systems, offering a scalable solution for real-world applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes