CL CVAug 7, 2024

Image-to-LaTeX Converter for Mathematical Formulas and Text

arXiv:2408.04015v13.44 citationsh-index: 4Has Code

Originality Synthesis-oriented

AI Analysis

This provides an open-source tool for researchers and students needing automated LaTeX generation from images, but it is incremental as it builds on existing encoder-decoder architectures.

The authors tackled the problem of converting images of mathematical formulas and text into LaTeX code by training vision encoder-decoder models, achieving a BLEU performance comparison on a handwritten test set against models like Pix2Text, TexTeller, and Sumen.

In this project, we train a vision encoder-decoder model to generate LaTeX code from images of mathematical formulas and text. Utilizing a diverse collection of image-to-LaTeX data, we build two models: a base model with a Swin Transformer encoder and a GPT-2 decoder, trained on machine-generated images, and a fine-tuned version enhanced with Low-Rank Adaptation (LoRA) trained on handwritten formulas. We then compare the BLEU performance of our specialized model on a handwritten test set with other similar models, such as Pix2Text, TexTeller, and Sumen. Through this project, we contribute open-source models for converting images to LaTeX and provide from-scratch code for building these models with distributed training and GPU optimizations.

View on arXiv PDF Code

Similar