CVAIJun 2, 2025

QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation

arXiv:2506.02295v18 citationsh-index: 39Has Code
Originality Incremental advance
AI Analysis

This work addresses the persistent problem of accurate Arabic text recognition for users in document processing and research, delivering a marked improvement in accuracy and efficiency.

The paper tackled the challenge of Arabic script OCR by developing Qari-OCR, a series of vision-language models fine-tuned on synthetic datasets, achieving a new open-source state-of-the-art with a Word Error Rate of 0.160 and Character Error Rate of 0.061 on diacritically-rich texts.

The inherent complexities of Arabic script; its cursive nature, diacritical marks (tashkeel), and varied typography, pose persistent challenges for Optical Character Recognition (OCR). We present Qari-OCR, a series of vision-language models derived from Qwen2-VL-2B-Instruct, progressively optimized for Arabic through iterative fine-tuning on specialized synthetic datasets. Our leading model, QARI v0.2, establishes a new open-source state-of-the-art with a Word Error Rate (WER) of 0.160, Character Error Rate (CER) of 0.061, and BLEU score of 0.737 on diacritically-rich texts. Qari-OCR demonstrates superior handling of tashkeel, diverse fonts, and document layouts, alongside impressive performance on low-resolution images. Further explorations (QARI v0.3) showcase strong potential for structural document understanding and handwritten text. This work delivers a marked improvement in Arabic OCR accuracy and efficiency, with all models and datasets released to foster further research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes