CLJun 7, 2024

Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences

arXiv:2406.04988v226 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of ignoring individual differences in psycholinguistic modeling for researchers, showing incremental improvements in prediction accuracy.

The study investigated whether incorporating individual cognitive capacities improves the predictive power of language model-derived surprisal and entropy on human reading times, finding that it generally increases accuracy and that models emulate readers with lower verbal intelligence, making them less accurate for high-verbal-intelligence groups.

To date, most investigations on surprisal and entropy effects in reading have been conducted on the group level, disregarding individual differences. In this work, we revisit the predictive power of surprisal and entropy measures estimated from a range of language models (LMs) on data of human reading times as a measure of processing effort by incorporating information of language users' cognitive capacities. To do so, we assess the predictive power of surprisal and entropy estimated from generative LMs on reading data obtained from individuals who also completed a wide range of psychometric tests. Specifically, we investigate if modulating surprisal and entropy relative to cognitive scores increases prediction accuracy of reading times, and we examine whether LMs exhibit systematic biases in the prediction of reading times for cognitively high- or low-performing groups, revealing what type of psycholinguistic subject a given LM emulates. Our study finds that in most cases, incorporating cognitive capacities increases predictive power of surprisal and entropy on reading times, and that generally, high performance in the psychometric tests is associated with lower sensitivity to predictability effects. Finally, our results suggest that the analyzed LMs emulate readers with lower verbal intelligence, suggesting that for a given target group (i.e., individuals with high verbal intelligence), these LMs provide less accurate predictability estimates.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes