CVMay 6, 2021

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer

arXiv:2105.02412v376 citations
AI Analysis

This work addresses inefficiencies in recognizing handwritten mathematical expressions, which is important for educational and document digitization applications, but it is incremental as it builds on existing encoder-decoder models with specific modifications.

The paper tackles the challenge of accurately assigning attention to image features and inefficiency in processing long LaTeX sequences in handwritten mathematical expression recognition by employing a transformer-based decoder and a novel bidirectional training strategy, resulting in improvements of 2.23%, 1.92%, and 2.28% in ExpRate on CROHME 2014, 2016, and 2019 datasets respectively.

Encoder-decoder models have made great progress on handwritten mathematical expression recognition recently. However, it is still a challenge for existing methods to assign attention to image features accurately. Moreover, those encoder-decoder models usually adopt RNN-based models in their decoder part, which makes them inefficient in processing long $\LaTeX{}$ sequences. In this paper, a transformer-based decoder is employed to replace RNN-based ones, which makes the whole model architecture very concise. Furthermore, a novel training strategy is introduced to fully exploit the potential of the transformer in bidirectional language modeling. Compared to several methods that do not use data augmentation, experiments demonstrate that our model improves the ExpRate of current state-of-the-art methods on CROHME 2014 by 2.23%. Similarly, on CROHME 2016 and CROHME 2019, we improve the ExpRate by 1.92% and 2.28% respectively.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes