GeoMathCode: Understanding Interleaved Math-Code Reasoning for Geometry Problem Solving
This work provides insights into the internal reasoning mechanisms of multimodal large language models for geometry problems, but the results are primarily analytical and incremental.
The paper introduces GeoMathCode, a framework using programmatic representations as intermediate visual outputs for geometry problem solving. It shows that reasoning and code generation steps can be disentangled in latent space, and supervised fine-tuning makes the reasoning manifold more structured and informative.
Mathematical reasoning is a hallmark of human intelligence, requiring logical deduction, symbolic manipulation, and abstract thinking. Recent multimodal large language models (MLLMs) have demonstrated strong performance on geometry problems through multi-step reasoning. To better emulate human problem-solving, intermediate steps can incorporate auxiliary visual constructions, such as additional lines or points, which improve geometric interpretation and educational clarity. In this work, we introduce the GeoMathCode, where programmatic representations serve as intermediate visual outputs. We further conduct an in-depth analysis of the underlying reasoning geometry. Experimental results show that reasoning and code generation steps can be disentangled in the latent space, while supervised fine-tuning (SFT) makes the reasoning manifold more structured and informative. Moreover, hierarchical syntactic code structures emerge as disentangled latent subspaces, and contain more mathematical symbolic information than visual representations.