CVSep 21, 2024

LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement

arXiv:2409.14201v29 citationsh-index: 7
AI Analysis

This addresses the challenge of modifying or exporting LaTeX sources from PDFs, which is crucial for researchers and professionals dealing with scientific documents, though it is incremental as it builds on prior recognition work.

The paper tackles the problem of accurately extracting LaTeX source code from PDF images for tables and formulae, proposing LATTE, an iterative refinement framework that improves extraction accuracy by at least 7.03% in exact match over existing methods, with success refinement rates of 46.08% for formulae and 25.51% for tables.

Portable Document Format (PDF) files are dominantly used for storing and disseminating scientific research, legal documents, and tax information. LaTeX is a popular application for creating PDF documents. Despite its advantages, LaTeX is not WYSWYG -- what you see is what you get, i.e., the LaTeX source and rendered PDF images look drastically different, especially for formulae and tables. This gap makes it hard to modify or export LaTeX sources for formulae and tables from PDF images, and existing work is still limited. First, prior work generates LaTeX sources in a single iteration and struggles with complex LaTeX formulae. Second, existing work mainly recognizes and extracts LaTeX sources for formulae; and is incapable or ineffective for tables. This paper proposes LATTE, the first iterative refinement framework for LaTeX recognition. Specifically, we propose delta-view as feedback, which compares and pinpoints the differences between a pair of rendered images of the extracted LaTeX source and the expected correct image. Such delta-view feedback enables our fault localization model to localize the faulty parts of the incorrect recognition more accurately and enables our LaTeX refinement model to repair the incorrect extraction more accurately. LATTE improves the LaTeX source extraction accuracy of both LaTeX formulae and tables, outperforming existing techniques as well as GPT-4V by at least 7.03% of exact match, with a success refinement rate of 46.08% (formula) and 25.51% (table).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes