CVAug 1, 2025

DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios

arXiv:2508.00311v12 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses the challenge of accurate formula recognition for scientific literature analysis, representing a novel paradigm rather than an incremental improvement.

The paper tackles the problem of recognizing mathematical formulas in complex and structured scientific documents, achieving state-of-the-art performance across various styles, domains, and layouts through a unified framework and a new dataset.

Optical Character Recognition (OCR) for mathematical formula is essential for the intelligent analysis of scientific literature. However, both task-specific and general vision-language models often struggle to handle the structural diversity, complexity, and real-world variability inherent in mathematical content. In this work, we present DocTron-Formula, a unified framework built upon general vision-language models, thereby eliminating the need for specialized architectures. Furthermore, we introduce CSFormula, a large-scale and challenging dataset that encompasses multidisciplinary and structurally complex formulas at the line, paragraph, and page levels. Through straightforward supervised fine-tuning, our approach achieves state-of-the-art performance across a variety of styles, scientific domains, and complex layouts. Experimental results demonstrate that our method not only surpasses specialized models in terms of accuracy and robustness, but also establishes a new paradigm for the automated understanding of complex scientific documents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes