IR CLApr 21, 2016

OCR Error Correction Using Character Correction and Feature-Based Word Classification

arXiv:1604.06225v114.676 citations

Originality Incremental advance

AI Analysis

This addresses OCR accuracy issues for users of Arabic text processing, but it is incremental as it builds on existing correction methods.

The paper tackled OCR error correction by developing a learned classifier that integrates a weighted confusion matrix and a shallow language model, resulting in improved correction of segmentation and recognition errors in Arabic text.

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.

View on arXiv PDF

Similar