CLDec 10, 2024

Algorithmic Phase Transitions in Language Models: A Mechanistic Case Study of Arithmetic

arXiv:2412.07386v11 citations
Originality Incremental advance
AI Analysis

This addresses the problem of understanding why language models fail at zero-shot generalization in logical reasoning tasks, which is incremental as it builds on existing mechanistic interpretability research.

The paper investigates algorithmic stability in language models, focusing on how they change problem-solving strategies for tasks like arithmetic, and finds that models like Gemma-2-2b use different computational models for closely related subtasks, such as four-digit versus eight-digit addition.

Zero-shot capabilities of large language models make them powerful tools for solving a range of tasks without explicit training. It remains unclear, however, how these models achieve such performance, or why they can zero-shot some tasks but not others. In this paper, we shed some light on this phenomenon by defining and investigating algorithmic stability in language models -- changes in problem-solving strategy employed by the model as a result of changes in task specification. We focus on a task where algorithmic stability is needed for generalization: two-operand arithmetic. Surprisingly, we find that Gemma-2-2b employs substantially different computational models on closely related subtasks, i.e. four-digit versus eight-digit addition. Our findings suggest that algorithmic instability may be a contributing factor to language models' poor zero-shot performance across certain logical reasoning tasks, as they struggle to abstract different problem-solving strategies and smoothly transition between them.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes