AICLMay 19

What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code

arXiv:2605.1976273.21 citations
AI Analysis

For researchers and practitioners training large language models, this work clarifies the role of code in reasoning and provides data-centric strategies to optimize cross-domain trade-offs.

The paper investigates whether code improves mathematical reasoning in language models. Controlled pretraining experiments on a 10T-token corpus show that code alone does not enhance reasoning; instead, structured reasoning signals like code-text and math-text mixtures are responsible for gains, and increasing math-domain samples within a fixed budget improves difficult mathematical reasoning while preserving programming performance.

Code has become a standard component of modern foundation language model (LM) training, yet its role beyond programming remains unclear. We revisit the claim that code improves reasoning through controlled pretraining experiments on a 10T-token corpus with fine-grained domain separation. Our findings are threefold. First, when code is restricted to standalone executable programs and Code-NL data are controlled for, code substantially improves programming ability but does not act as a general reasoning enhancer; instead, it competes with knowledge-intensive tasks, especially complex mathematical reasoning. Second, the reasoning gains often attributed to code are better explained by cross-domain structured reasoning traces, such as code-text and math-text mixtures, rather than by executable code alone. Third, increasing the density of structured math-domain samples within a fixed math budget yields substantial gains on difficult mathematical reasoning while largely preserving programming performance, suggesting that cognitive scaffolds offer a targeted way to mitigate cross-domain trade-offs. Finally, routing analyses show that data-composition effects are reflected in expert-activation patterns, providing mechanism-level evidence for competitive and synergistic interactions across domains. Our results clarify which data characteristics transfer across capability dimensions and point to more precise data-centric optimization strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes