CVMar 9

A Hybrid Vision Transformer Approach for Mathematical Expression Recognition

Anh Duy Le, Van Linh Pham, Vinh Loi Ly, Nam Quan Nguyen, Huu Thang Nguyen, Tuan Anh Tran

arXiv:2603.07929v14.8h-index: 13

Predicted impact top 98% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work provides an incremental improvement in mathematical expression recognition for document analysis systems.

This paper addresses mathematical expression recognition, a complex task due to the two-dimensional structure and varying symbol sizes. The proposed Hybrid Vision Transformer (HVT) approach achieved a BLEU score of 89.94 on the IM2LATEX-100K dataset, outperforming current state-of-the-art methods.

One of the crucial challenges taken in document analysis is mathematical expression recognition. Unlike text recognition which only focuses on one-dimensional structure images, mathematical expression recognition is a much more complicated problem because of its two-dimensional structure and different symbol size. In this paper, we propose using a Hybrid Vision Transformer (HVT) with 2D positional encoding as the encoder to extract the complex relationship between symbols from the image. A coverage attention decoder is used to better track attention's history to handle the under-parsing and over-parsing problems. We also showed the benefit of using the [CLS] token of ViT as the initial embedding of the decoder. Experiments performed on the IM2LATEX-100K dataset have shown the effectiveness of our method by achieving a BLEU score of 89.94 and outperforming current state-of-the-art methods.

View on arXiv PDF

Similar