CLAug 17, 2025

The Structural Sources of Verb Meaning Revisited: Large Language Models Display Syntactic Bootstrapping

arXiv:2508.12482v11 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work tests a developmental hypothesis in linguistics using large language models, offering a scalable approach but is incremental in applying existing methods to new data.

The study investigated whether large language models exhibit syntactic bootstrapping, a hypothesis in child language learning, by training RoBERTa and GPT-2 on datasets with ablated syntactic information. Results showed that verb representation degraded more with syntactic removal than co-occurrence removal, especially for mental verbs, while noun representation was more affected by co-occurrence distortion.

Syntactic bootstrapping (Gleitman, 1990) is the hypothesis that children use the syntactic environments in which a verb occurs to learn its meaning. In this paper, we examine whether large language models exhibit a similar behavior. We do this by training RoBERTa and GPT-2 on perturbed datasets where syntactic information is ablated. Our results show that models' verb representation degrades more when syntactic cues are removed than when co-occurrence information is removed. Furthermore, the representation of mental verbs, for which syntactic bootstrapping has been shown to be particularly crucial in human verb learning, is more negatively impacted in such training regimes than physical verbs. In contrast, models' representation of nouns is affected more when co-occurrences are distorted than when syntax is distorted. In addition to reinforcing the important role of syntactic bootstrapping in verb learning, our results demonstrated the viability of testing developmental hypotheses on a larger scale through manipulating the learning environments of large language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes