CLAISep 23, 2025

GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models

arXiv:2509.19593v11 citationsh-index: 2EMNLP
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving interactive reasoning in LLMs for applications like AI assistants, though it is incremental as it builds on existing evaluation methods.

The paper tackles the problem of evaluating how informative open-ended questions from large language models (LLMs) are in interactive reasoning, by introducing GuessingGame, a protocol that measures question quality with information gain metrics, resulting in a 43% reduction in expected game length for a one-standard-deviation increase in information gain.

We introduce GuessingGame, a protocol for evaluating large language models (LLMs) as strategic question-askers in open-ended, open-domain settings. A Guesser LLM identifies a hidden object by posing free-form questions to an Oracle without predefined choices or candidate lists. To measure question quality, we propose two information gain (IG) metrics: a Bayesian method that tracks belief updates over semantic concepts using LLM-scored relevance, and an entropy-based method that filters candidates via ConceptNet. Both metrics are model-agnostic and support post hoc analysis. Across 858 games with multiple models and prompting strategies, higher IG strongly predicts efficiency: a one-standard-deviation IG increase reduces expected game length by 43\%. Prompting constraints guided by IG, such as enforcing question diversity, enable weaker models to significantly improve performance. These results show that question-asking in LLMs is both measurable and improvable, and crucial for interactive reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes