A Compositional Textual Model for Recognition of Imperfect Word Images
This addresses a specific bottleneck in industrial OCR systems by enhancing recognition of incomplete text regions, though it appears incremental as it builds on existing deep learning methods.
The paper tackled the problem of text recognition from imperfect images by developing a compositional model that directly models geometric perturbations, enabling recovery of missing characters and improving recognition when detection returns incomplete information.
Printed text recognition is an important problem for industrial OCR systems. Printed text is constructed in a standard procedural fashion in most settings. We develop a mathematical model for this process that can be applied to the backward inference problem of text recognition from an image. Through ablation experiments we show that this model is realistic and that a multi-task objective setting can help to stabilize estimation of its free parameters, enabling use of conventional deep learning methods. Furthermore, by directly modeling the geometric perturbations of text synthesis we show that our model can help recover missing characters from incomplete text regions, the bane of multicomponent OCR systems, enabling recognition even when the detection returns incomplete information.