CVDLMar 28, 2017

Important New Developments in Arabographic Optical Character Recognition (OCR)

arXiv:1703.09550v125 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a freely accessible, high-accuracy OCR tool for the Islamic, Persian, and Arabic Studies communities, addressing a domain-specific need with incremental improvements over existing proprietary methods.

The OpenITI team tackled OCR for classical Arabic-script texts, achieving accuracy rates in the high nineties on over 7,000 lines of diverse texts, which improves upon proprietary options and is made freely available as open-source software.

The OpenITI team has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines. These accuracy rates not only represent a distinct improvement over the actual accuracy rates of the various proprietary OCR options for classical Arabic-script texts, but, equally important, they are produced using an open-source OCR software, thus enabling us to make this Arabic-script OCR technology freely available to the broader Islamic, Persian, and Arabic Studies communities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes