CLAIJan 15, 2025

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

arXiv:2501.08618v12 citationsh-index: 3
Originality Incremental advance
AI Analysis

This research addresses the problem of understanding how LLMs process linguistic structure, with implications for cognitive science and AI, though it is incremental in building on prior neurological findings.

The study investigated whether large language models develop distinct processing mechanisms for hierarchical versus linear grammars, finding that they exhibit separate components for each grammar type and that hierarchy sensitivity persists even with nonce words, indicating it is not tied to meaning or in-distribution inputs.

All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise solely from exposure to large-scale language distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the underlying grammars to conform to either hierarchical or linear/positional rules. Using these grammars, we first observe that language models show distinct behaviors on hierarchical versus linearly structured inputs. Then, we find that the components responsible for processing hierarchical grammars are distinct from those that process linear grammars; we causally verify this in ablation experiments. Finally, we observe that hierarchy-selective components are also active on nonce grammars; this suggests that hierarchy sensitivity is not tied to meaning, nor in-distribution inputs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes