Adaptive Elicitation of Latent Information Using Natural Language
This work addresses the challenge of improving information-gathering strategies in natural language applications like education and diagnostics, though it is incremental as it builds on existing LLM capabilities with a novel uncertainty quantification approach.
The paper tackles the problem of strategically gathering natural language information to reduce uncertainty about latent entities, such as student learning or user preferences, by proposing an adaptive elicitation framework that uses meta-learned language models for uncertainty quantification and query selection, resulting in consistent outperformance of baselines in tasks like the 20 questions game and adaptive student assessment.
Eliciting information to reduce uncertainty about a latent entity is a critical task in many application domains, e.g., assessing individual student learning outcomes, diagnosing underlying diseases, or learning user preferences. Though natural language is a powerful medium for this purpose, large language models (LLMs) and existing fine-tuning algorithms lack mechanisms for strategically gathering information to refine their own understanding of the latent entity. To harness the generalization power and world knowledge of LLMs in developing effective information-gathering strategies, we propose an adaptive elicitation framework that actively reduces uncertainty on the latent entity. Since probabilistic modeling of an abstract latent entity is difficult, our framework adopts a predictive view of uncertainty, using a meta-learned language model to simulate future observations and enable scalable uncertainty quantification over complex natural language. Through autoregressive forward simulation, our model quantifies how new questions reduce epistemic uncertainty, enabling the development of sophisticated information-gathering strategies to choose the most informative next queries. In experiments on the 20 questions game, dynamic opinion polling, and adaptive student assessment, our method consistently outperforms baselines in identifying critical unknowns and improving downstream predictions, illustrating the promise of strategic information gathering in natural language settings.