N-gram-like Language Models Predict Reading Time Best
This addresses a challenge in psycholinguistics and NLP for researchers and practitioners by showing that incremental improvements in language modeling may not always align with human cognitive processes like reading.
The paper tackles the problem that advanced transformer language models, despite excelling at next-word prediction, perform worse at predicting reading time, and finds that reading time is better predicted by simpler n-gram statistics, with neural models correlating more with n-gram probability showing higher correlation to eye-tracking metrics.
Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the more complex statistics learned by state-of-the-art transformer language models. We demonstrate that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.