Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition
This addresses the problem of accurately recognizing complex handwritten mathematical expressions for applications in education and document digitization, representing an incremental improvement with specific gains.
The paper tackles handwritten mathematical expression recognition by improving an attention-based encoder-decoder model with densely connected convolutional networks and a novel multi-scale attention mechanism, achieving expression recognition accuracies of 52.8% on CROHME 2014 and 50.1% on CROHME 2016, significantly outperforming state-of-the-art methods.
Handwritten mathematical expression recognition is a challenging problem due to the complicated two-dimensional structures, ambiguous handwriting input and variant scales of handwritten math symbols. To settle this problem, we utilize the attention based encoder-decoder model that recognizes mathematical expression images from two-dimensional layouts to one-dimensional LaTeX strings. We improve the encoder by employing densely connected convolutional networks as they can strengthen feature extraction and facilitate gradient propagation especially on a small training set. We also present a novel multi-scale attention model which is employed to deal with the recognition of math symbols in different scales and save the fine-grained details that will be dropped by pooling operations. Validated on the CROHME competition task, the proposed method significantly outperforms the state-of-the-art methods with an expression recognition accuracy of 52.8% on CROHME 2014 and 50.1% on CROHME 2016, by only using the official training dataset.