CLJun 5, 2025

Information Locality as an Inductive Bias for Neural Language Models

Taiga Someya, Anej Svete, Brian DuSell, Timothy J. O'Donnell, Mario Giulianelli, Ryan Cotterell

AI2ETH Zurich

arXiv:2506.05136v110.97 citationsh-index: 17Has CodeACL

Originality Incremental advance

AI Analysis

This addresses the debate on whether neural language models align with human processing constraints, providing insights for researchers in computational linguistics and AI, though it is incremental as it builds on existing theories with new measures.

The paper tackled the problem of understanding inductive biases in neural language models by proposing a quantitative framework with an information-theoretic measure called m-local entropy, and found that languages with higher m-local entropy are more difficult for Transformer and LSTM models to learn, suggesting sensitivity to local statistical structure.

Inductive biases are inherent in every machine learning system, shaping how models generalize from finite data. In the case of neural language models (LMs), debates persist as to whether these biases align with or diverge from human processing constraints. To address this issue, we propose a quantitative framework that allows for controlled investigations into the nature of these biases. Within our framework, we introduce $m$-local entropy$\unicode{x2013}$an information-theoretic measure derived from average lossy-context surprisal$\unicode{x2013}$that captures the local uncertainty of a language by quantifying how effectively the $m-1$ preceding symbols disambiguate the next symbol. In experiments on both perturbed natural language corpora and languages defined by probabilistic finite-state automata (PFSAs), we show that languages with higher $m$-local entropy are more difficult for Transformer and LSTM LMs to learn. These results suggest that neural LMs, much like humans, are highly sensitive to the local statistical structure of a language.

View on arXiv PDF Code

Similar