Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms
For researchers studying LLM internals, this provides mechanistic insights into arithmetic reasoning, but the findings are incremental as they confirm known modularity patterns.
The paper investigates how LLMs perform arithmetic operations by tracing next-token predictions across layers using early decoding. It finds that correct result generation occurs only in final layers, and proficient models show a clear division of labor between attention and MLP modules, which is absent in less proficient models.
Large language models (LLMs) have demonstrated impressive capabilities, yet their internal mechanisms for handling reasoning-intensive tasks remain underexplored. To advance the understanding of model-internal processing mechanisms, we present an investigation of how LLMs perform arithmetic operations by examining internal mechanisms during task execution. Using early decoding, we trace how next-token predictions are constructed across layers. Our experiments reveal that while the models recognize arithmetic tasks early, correct result generation occurs only in the final layers. Notably, models proficient in arithmetic exhibit a clear division of labor between attention and MLP modules, where attention propagates input information and MLP modules aggregate it. This division is absent in less proficient models. Furthermore, successful models appear to process more challenging arithmetic tasks functionally, suggesting reasoning capabilities beyond factual recall.