CLApr 30, 2024

Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics

MIT
arXiv:2404.19178v26 citationsh-index: 10
Originality Incremental advance
AI Analysis

This challenges the assumption that transformers are uniquely suited for modeling human language comprehension, potentially influencing debates in cognitive science and NLP.

The paper tackled the problem of modeling human language comprehension, showing that contemporary recurrent models (RWKV and Mamba) can match or exceed transformers in performance on this task.

Transformers have generally supplanted recurrent neural networks as the dominant architecture for both natural language processing tasks and for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent model architectures, RWKV and Mamba, appear to perform natural language tasks comparably to or better than transformers of equivalent scale. In this paper, we show that contemporary recurrent models are now also able to match - and in some cases, exceed - the performance of comparably sized transformers at modeling online human language comprehension. This suggests that transformer language models are not uniquely suited to this task, and opens up new directions for debates about the extent to which architectural features of language models make them better or worse models of human language comprehension.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes