CLDec 10, 2023

Arabic Handwritten Text Line Dataset

arXiv:2312.07573v11 citations
Originality Synthesis-oriented
AI Analysis

This work provides a domain-specific resource for researchers in Arabic text recognition, but it is incremental as it builds on existing line-level datasets.

The authors tackled the lack of annotated word-level datasets for historical Arabic script by presenting a new dataset that annotates word positions, addressing a gap in segmentation for recognition systems.

Segmentation of Arabic manuscripts into lines of text and words is an important step to make recognition systems more efficient and accurate. The problem of segmentation into text lines is solved since there are carefully annotated dataset dedicated to this task. However, To the best of our knowledge, there are no dataset annotating the word position of Arabic texts. In this paper, we present a new dataset specifically designed for historical Arabic script in which we annotate position in word level.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes