Texo: Formula Recognition within 20M Parameters
This work addresses efficient formula recognition for end users, enabling deployment on consumer-grade hardware, but it is incremental as it builds on existing methods with size reduction.
The paper tackles formula recognition by introducing Texo, a minimalist model with only 20 million parameters that achieves comparable performance to state-of-the-art models while reducing size by 80% and 65%, enabling real-time inference on consumer hardware.
In this paper we present Texo, a minimalist yet highperformance formula recognition model that contains only 20 million parameters. By attentive design, distillation and transfer of the vocabulary and the tokenizer, Texo achieves comparable performance to state-of-the-art models such as UniMERNet-T and PPFormulaNet-S, while reducing the model size by 80% and 65%, respectively. This enables real-time inference on consumer-grade hardware and even in-browser deployment. We also developed a web application to demonstrate the model capabilities and facilitate its usage for end users.