CVAICLJul 18, 2024

Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

arXiv:2407.13559v131 citationsh-index: 20
Originality Incremental advance
AI Analysis

This provides a leading solution for Arabic script recognition, addressing unique challenges like cursive text and diacritics, with incremental improvements in accuracy and efficiency.

The study tackled the problem of Arabic Optical Character and Handwriting Recognition by introducing Qalam, a foundation model that achieved a Word Error Rate of 0.80% in HWR and 1.18% in OCR, significantly outperforming existing methods.

Arabic Optical Character Recognition (OCR) and Handwriting Recognition (HWR) pose unique challenges due to the cursive and context-sensitive nature of the Arabic script. This study introduces Qalam, a novel foundation model designed for Arabic OCR and HWR, built on a SwinV2 encoder and RoBERTa decoder architecture. Our model significantly outperforms existing methods, achieving a Word Error Rate (WER) of just 0.80% in HWR tasks and 1.18% in OCR tasks. We train Qalam on a diverse dataset, including over 4.5 million images from Arabic manuscripts and a synthetic dataset comprising 60k image-text pairs. Notably, Qalam demonstrates exceptional handling of Arabic diacritics, a critical feature in Arabic scripts. Furthermore, it shows a remarkable ability to process high-resolution inputs, addressing a common limitation in current OCR systems. These advancements underscore Qalam's potential as a leading solution for Arabic script recognition, offering a significant leap in accuracy and efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes