CLNov 5, 2025

Bearing Syntactic Fruit with Stack-Augmented Neural Networks

arXiv:2511.03547v12.7h-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of modeling human language acquisition in AI, offering a more accurate tool for psycholinguistic studies, though it is incremental in improving existing architectures.

The paper tackled the problem of neural networks lacking human-like hierarchical syntactic generalization without special conditions, and demonstrated that stack-augmented neural networks, particularly transformers with nondeterministic stacks, achieve this generalization best on a question formation task.

Any finite set of training data is consistent with an infinite number of hypothetical algorithms that could have generated it. Studies have shown that when human children learn language, they consistently favor hypotheses based on hierarchical syntactic rules without ever encountering disambiguating examples. A recent line of work has inquired as to whether common neural network architectures share this bias, finding that they do so only under special conditions: when syntactically supervised, when pre-trained on massive corpora, or when trained long past convergence. In this paper, we demonstrate, for the first time, neural network architectures that are able to generalize in human-like fashion without any of the aforementioned requirements: stack-augmented neural networks. We test three base architectures (transformer, simple RNN, LSTM) augmented with two styles of stack: the superposition stack of Joulin & Mikolov (2015) and a nondeterministic generalization of it proposed by DuSell & Chiang (2023). We find that transformers with nondeterministic stacks generalize best out of these architectures on a classical question formation task. We also propose a modification to the stack RNN architecture that improves hierarchical generalization. These results suggest that stack-augmented neural networks may be more accurate models of human language acquisition than standard architectures, serving as useful objects of psycholinguistic study. Our code is publicly available.

View on arXiv PDF

Similar