Sarah Bowen Savant

1paper

1 Paper

CVMar 28, 2017Code
Important New Developments in Arabographic Optical Character Recognition (OCR)

Maxim Romanov, Matthew Thomas Miller, Sarah Bowen Savant et al.

The OpenITI team has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines. These accuracy rates not only represent a distinct improvement over the actual accuracy rates of the various proprietary OCR options for classical Arabic-script texts, but, equally important, they are produced using an open-source OCR software, thus enabling us to make this Arabic-script OCR technology freely available to the broader Islamic, Persian, and Arabic Studies communities.