LG CLNov 25, 2024

Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures

Fu-Chieh Chang, You-Chen Lin, Pei-Yuan Wu

arXiv:2411.16260v31 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the problem of understanding LLMs' arithmetic mechanisms for researchers, offering incremental insights into improving performance.

The paper investigates how large language models perform arithmetic by proposing they capture algebraic structures like commutativity and identity properties, and empirically demonstrates this with a custom dataset and theoretical evidence, showing it can enhance arithmetic capabilities.

Large language models (LLMs) have demonstrated remarkable mathematical capabilities, largely driven by chain-of-thought (CoT) prompting, which decomposes complex reasoning into step-by-step solutions. This approach has enabled significant advancements, as evidenced by performance on benchmarks like GSM8K and MATH. However, the mechanisms underlying LLMs' ability to perform arithmetic in a single step of CoT remain poorly understood. Existing studies debate whether LLMs encode numerical values or rely on symbolic reasoning, while others explore attention and multi-layered processing in arithmetic tasks. In this work, we propose that LLMs learn arithmetic by capturing algebraic structures, such as commutativity and identity properties. Since these structures are observable through input-output relationships, they can generalize to unseen data. We empirically demonstrate that LLMs can learn algebraic structures using a custom dataset of arithmetic problems, as well as providing theoretical evidence showing that, under specific configurations of weights and biases, the transformer-based LLMs can generate embeddings that remain invariant to both permutations of input tokens and the presence of identity elements. Our findings indicate that leveraging algebraic structures can enhance the LLMs' arithmetic capabilities, offering insights into improving their arithmetic performance.

View on arXiv PDF

Similar