CVMar 31

MathGen: Revealing the Illusion of Mathematical Competence through Text-to-Image Generation

Ruiyao Liu, Hui Shen, Ping Zhang, Yunta Hsieh, Yifan Zhang, Jing Xu, Sicheng Chen, Junchen Li, Jiawei Lu, Jianing Ma, Jiaqi Mo, Qi Han

arXiv:2603.2795993.3h-index: 11Has Code

AI Analysis

This work highlights a critical bottleneck in AI for applications requiring precise visual mathematical outputs, such as education or scientific visualization, though it is incremental in benchmarking existing models.

The paper tackles the problem of whether generative models can produce correct visual representations of mathematical solutions, revealing that current text-to-image models perform poorly, with the best closed-source model achieving only 42.0% accuracy and open-source models as low as ~1-11%.

Modern generative models have demonstrated the ability to solve challenging mathematical problems. In many real-world settings, however, mathematical solutions must be expressed visually through diagrams, plots, geometric constructions, and structured symbolic layouts, where correctness depends on precise visual composition. This naturally raises the question of whether generative models can still do so when the answer must be rendered visually rather than written in text? To study this problem, we introduce MathGen, a rigorous benchmark of 900 problems spanning seven core domains, each paired with an executable verifier under a Script-as-a-Judge protocol for deterministic and objective evaluation. Experiments on representative open-source and proprietary text-to-image models show that mathematical fidelity remains a major bottleneck: even the best closed-source model reaches only 42.0% overall accuracy, while open-source models achieve just ~ 1-11%, often near 0% on structured tasks. Overall, current T2I models remain far from competent at even elementary mathematical visual generation.

View on arXiv PDF

Similar