Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
This addresses the debate on the robustness of reasoning in LLMs by showing how symbolic mechanisms emerge, potentially bridging symbolic and neural network approaches, though it is incremental in clarifying existing capabilities.
The study investigated the internal mechanisms enabling abstract reasoning in large language models, identifying an emergent symbolic architecture that performs computations like symbol abstraction, symbolic induction, and retrieval across model layers.
Many recent studies have found evidence for emergent reasoning capabilities in large language models (LLMs), but debate persists concerning the robustness of these capabilities, and the extent to which they depend on structured reasoning mechanisms. To shed light on these issues, we study the internal mechanisms that support abstract reasoning in LLMs. We identify an emergent symbolic architecture that implements abstract reasoning via a series of three computations. In early layers, symbol abstraction heads convert input tokens to abstract variables based on the relations between those tokens. In intermediate layers, symbolic induction heads perform sequence induction over these abstract variables. Finally, in later layers, retrieval heads predict the next token by retrieving the value associated with the predicted abstract variable. These results point toward a resolution of the longstanding debate between symbolic and neural network approaches, suggesting that emergent reasoning in neural networks depends on the emergence of symbolic mechanisms.