A Bayesian model for recognizing handwritten mathematical expressions
This work addresses the challenge of ambiguous handwritten input for users in fields like education or document digitization, but it is incremental as it builds on existing recognition methods with a probabilistic enhancement.
The paper tackles the problem of recognizing handwritten mathematical expressions by presenting a system that captures all possible interpretations and organizes them in a parse forest, using a novel Bayesian network for probabilistic tree scoring. The result is a more accurate recognition system, as demonstrated in evaluations showing improved accuracy over previous non-probabilistic methods and other academic recognizers.
Recognizing handwritten mathematics is a challenging classification problem, requiring simultaneous identification of all the symbols comprising an input as well as the complex two-dimensional relationships between symbols and subexpressions. Because of the ambiguity present in handwritten input, it is often unrealistic to hope for consistently perfect recognition accuracy. We present a system which captures all recognizable interpretations of the input and organizes them in a parse forest from which individual parse trees may be extracted and reported. If the top-ranked interpretation is incorrect, the user may request alternates and select the recognition result they desire. The tree extraction step uses a novel probabilistic tree scoring strategy in which a Bayesian network is constructed based on the structure of the input, and each joint variable assignment corresponds to a different parse tree. Parse trees are then reported in order of decreasing probability. Two accuracy evaluations demonstrate that the resulting recognition system is more accurate than previous versions (which used non-probabilistic methods) and other academic math recognizers.