CL AI SEJun 30, 2025

Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

Daking Rai, Samuel Miller, Kevin Moran, Ziyu Yao

arXiv:2507.00322v18.33 citationsh-index: 5

Originality Incremental advance

AI Analysis

This addresses a fundamental reliability issue in language models for coding and reasoning tasks, though it is incremental as it builds on existing steering methods.

The study tackled the problem of language models making errors on simple syntactic tasks like balanced parentheses by revealing that errors occur when faulty internal mechanisms overshadow sound ones, and introduced RASteer to boost accuracy from 0% to around 100% on these tasks.

Despite remarkable advances in coding capabilities, language models (LMs) still struggle with simple syntactic tasks such as generating balanced parentheses. In this study, we investigate the underlying mechanisms behind the persistence of these errors across LMs of varying sizes (124M-7B) to both understand and mitigate the errors. Our study reveals that LMs rely on a number of components (attention heads and FF neurons) that independently make their own predictions. While some components reliably promote correct answers across a generalized range of inputs (i.e., implementing "sound mechanisms''), others are less reliable and introduce noise by promoting incorrect tokens (i.e., implementing "faulty mechanisms''). Errors occur when the faulty mechanisms overshadow the sound ones and dominantly affect the predictions. Motivated by this insight, we introduce RASteer, a steering method to systematically identify and increase the contribution of reliable components for improving model performance. RASteer substantially improves performance on balanced parentheses tasks, boosting accuracy of some models from $0$% to around $100$% without impairing the models' general coding ability. We further demonstrate its broader applicability in arithmetic reasoning tasks, achieving performance gains of up to around $20$%.

View on arXiv PDF

Similar