CL AISep 23, 2025

GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models

Dylan Hutson, Daniel Vennemeyer, Aneesh Deshmukh, Justin Zhan, Tianyu Jiang

arXiv:2509.19593v11 citationsh-index: 2EMNLP

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving interactive reasoning in LLMs for applications like AI assistants, though it is incremental as it builds on existing evaluation methods.

The paper tackles the problem of evaluating how informative open-ended questions from large language models (LLMs) are in interactive reasoning, by introducing GuessingGame, a protocol that measures question quality with information gain metrics, resulting in a 43% reduction in expected game length for a one-standard-deviation increase in information gain.

We introduce GuessingGame, a protocol for evaluating large language models (LLMs) as strategic question-askers in open-ended, open-domain settings. A Guesser LLM identifies a hidden object by posing free-form questions to an Oracle without predefined choices or candidate lists. To measure question quality, we propose two information gain (IG) metrics: a Bayesian method that tracks belief updates over semantic concepts using LLM-scored relevance, and an entropy-based method that filters candidates via ConceptNet. Both metrics are model-agnostic and support post hoc analysis. Across 858 games with multiple models and prompting strategies, higher IG strongly predicts efficiency: a one-standard-deviation IG increase reduces expected game length by 43\%. Prompting constraints guided by IG, such as enforcing question diversity, enable weaker models to significantly improve performance. These results show that question-asking in LLMs is both measurable and improvable, and crucial for interactive reasoning.

View on arXiv PDF

Similar