LGCLDec 5, 2024

Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization

arXiv:2412.04619v47 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the problem of understanding generalization mechanisms in language models for researchers in NLP and AI, though it is incremental as it builds on prior work on grammar-learning tasks.

The study investigates how training data complexity and diversity influence whether language models learn hierarchical syntactic rules or linear shortcuts, finding that complex data with center-embedded clauses drives hierarchical generalization, while diverse data promotes stable rule learning, with intermediate conditions leading to unstable outcomes.

Early in training, LMs can behave like n-gram models, but eventually they often learn tree-based syntactic rules and generalize hierarchically out of distribution (OOD). We study this shift using controlled grammar-learning tasks: question formation and tense inflection. We find that a model learns to generalize hierarchically if its training data is _complex_-in particular, if it includes center-embedded clauses, a special syntactic structure. Under this definition, complex data drives hierarchical rules, while less complex data encourages shortcut learning in the form of n-gram-like linear rules. Furthermore, we find that a model uses rules to generalize, whether hierarchical or linear, if its training data is _diverse_-in particular, if it includes many distinct syntax trees in the training set. Under this definition, diverse data promotes stable rule learning, whereas less diverse data promotes memorization of individual syntactic sequences. Finally, intermediate diversity and intermediate complexity form an *unstable regime*, which is characterized by oscillatory learning dynamics and inconsistent behaviors across random seeds. These results highlight the central role of training data in shaping generalization and explain why competing strategies can lead to unstable outcomes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes