CLCVAug 7, 2024

Image-to-LaTeX Converter for Mathematical Formulas and Text

arXiv:2408.04015v14 citationsh-index: 4Has Code
Originality Synthesis-oriented
AI Analysis

This provides an open-source tool for researchers and students needing automated LaTeX generation from images, but it is incremental as it builds on existing encoder-decoder architectures.

The authors tackled the problem of converting images of mathematical formulas and text into LaTeX code by training vision encoder-decoder models, achieving a BLEU performance comparison on a handwritten test set against models like Pix2Text, TexTeller, and Sumen.

In this project, we train a vision encoder-decoder model to generate LaTeX code from images of mathematical formulas and text. Utilizing a diverse collection of image-to-LaTeX data, we build two models: a base model with a Swin Transformer encoder and a GPT-2 decoder, trained on machine-generated images, and a fine-tuned version enhanced with Low-Rank Adaptation (LoRA) trained on handwritten formulas. We then compare the BLEU performance of our specialized model on a handwritten test set with other similar models, such as Pix2Text, TexTeller, and Sumen. Through this project, we contribute open-source models for converting images to LaTeX and provide from-scratch code for building these models with distributed training and GPU optimizations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes