CLJul 29, 2025

How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor?

arXiv:2507.22209v11 citationsh-index: 9IJCNLP-AACL
Originality Synthesis-oriented
AI Analysis

This addresses a methodological issue for psycholinguistic researchers, but it is incremental as it focuses on improving an existing approximation without introducing a new paradigm.

The study tackled the problem of approximating word entropy in psycholinguistics by comparing first-token entropy estimates with Monte Carlo word entropy estimates, finding divergent results in reading time experiments that caution against using first-token approximations.

Contextual entropy is a psycholinguistic measure capturing the anticipated difficulty of processing a word just before it is encountered. Recent studies have tested for entropy-related effects as a potential complement to well-known effects from surprisal. For convenience, entropy is typically estimated based on a language model's probability distribution over a word's first subword token. However, this approximation results in underestimation and potential distortion of true word entropy. To address this, we generate Monte Carlo (MC) estimates of word entropy that allow words to span a variable number of tokens. Regression experiments on reading times show divergent results between first-token and MC word entropy, suggesting a need for caution in using first-token approximations of contextual entropy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes