Diverging Transformer Predictions for Human Sentence Processing: A Comprehensive Analysis of Agreement Attraction Effects
This work addresses the problem of evaluating transformers as cognitive models for human sentence processing, highlighting limitations in current models for linguists and cognitive scientists, and is incremental as it builds on prior evaluations with a more comprehensive dataset.
The study evaluated eleven autoregressive transformers on English agreement attraction effects to assess their cognitive adequacy for human sentence processing, finding that while they align with human data for prepositional phrase configurations, performance degrades significantly on object-extracted relative clauses, with no model replicating human asymmetric interference patterns.
Transformers underlie almost all state-of-the-art language models in computational linguistics, yet their cognitive adequacy as models of human sentence processing remains disputed. In this work, we use a surprisal-based linking mechanism to systematically evaluate eleven autoregressive transformers of varying sizes and architectures on a more comprehensive set of English agreement attraction configurations than prior work. Our experiments yield mixed results: While transformer predictions generally align with human reading time data for prepositional phrase configurations, performance degrades significantly on object-extracted relative clause configurations. In the latter case, predictions also diverge markedly across models, and no model successfully replicates the asymmetric interference patterns observed in humans. We conclude that current transformer models do not explain human morphosyntactic processing, and that evaluations of transformers as cognitive models must adopt rigorous, comprehensive experimental designs to avoid spurious generalizations from isolated syntactic configurations or individual models.