Information-Theoretic Storage Cost in Sentence Comprehension
This provides a more flexible and theory-neutral approach to modeling cognitive load in psycholinguistics, though it is incremental as it builds on existing information-theoretic ideas applied to a specific domain.
The study tackled the problem of measuring working memory load in real-time sentence comprehension by proposing a continuous, information-theoretic storage cost metric based on the information previous words carry about future context, which was validated by recovering known processing asymmetries, correlating with grammar-based costs, and predicting reading-time variance in English datasets.
Real-time sentence comprehension imposes a significant load on working memory, as comprehenders must maintain contextual information to anticipate future input. While measures of such load have played an important role in psycholinguistic theories, they have been formalized, largely, using symbolic grammars, which assign discrete, uniform costs to syntactic predictions. This study proposes a measure of processing storage cost based on an information-theoretic formalization, as the amount of information previous words carry about future context, under uncertainty. Unlike previous discrete, grammar-based metrics, this measure is continuous, theory-neutral, and can be estimated from pre-trained neural language models. The validity of this approach is demonstrated through three analyses in English: our measure (i) recovers well-known processing asymmetries in center embeddings and relative clauses, (ii) correlates with a grammar-based storage cost in a syntactically-annotated corpus, and (iii) predicts reading-time variance in two large-scale naturalistic datasets over and above baseline models with traditional information-based predictors.