CLMay 14

Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models

arXiv:2605.1536554.3
AI Analysis

This work provides a resource-rational account of human communication under vocabulary constraints, with implications for psycholinguistics, second-language communication, and language impairments.

Humans constrained to a small vocabulary (e.g., 250 words) communicate using strategies that resemble greedy sampling more than globally optimal planning, though more skilled individuals show non-greedy backtracking. Both greedy and optimal models reproduce the human tendency to rely on semantically light words under high constraint.

Communicating using only a limited vocabulary is a common but challenging cognitive phenomenon, requiring an ideal communicator to plan carefully to optimize for intelligibility while circumventing a constrained lexicon. In this work, we investigate how humans respond to a broad array of questions under variable vocabulary limitations, consisting of only 250 highly frequent words at the most restrictive. We provide theoretically motivated comparisons to greedy and globally optimal sampling algorithms using Sequential Monte Carlo inference with large language models. Humans generally resemble greedy sampling more than globally optimal sampling, though more skilled humans are more likely to backtrack and revise -- a non-greedy behavior. An observed human pattern of leaning on semantically light words in high-constraint settings falls out of both greedy and globally optimal sampling. We discuss the results and their broader implications for resource-rational cognition, psycholinguistics, L2 communication, and language impairments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes