CLApr 13

YIELD: A Large-Scale Dataset and Evaluation Framework for Information Elicitation Agents

arXiv:2604.1096884.5h-index: 5Has Code

Predicted impact top 54% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This work provides the first large-scale dataset and formal framework for information elicitation agents, enabling systematic research in a previously underexplored area of conversational AI.

The paper introduces Information Elicitation Agents (IEAs) and presents YIELD, a 26M-token dataset of 2,281 human-to-human dialogues for training and evaluating such agents. Experiments show that training on YIELD improves alignment of LLMs with real elicitation behavior, corroborated by human evaluation.

Most conversational agents (CAs) are designed to satisfy user needs through user-driven interactions. However, many real-world settings, such as academic interviewing, judicial proceedings, and journalistic investigations, involve broader institutional decision-making processes and require agents that can elicit information from users. In this paper, we introduce Information Elicitation Agents (IEAs) in which the agent's goal is to elicit information from users to support the agent's institutional or task-oriented objectives. To enable systematic research on this setting, we present YIELD, a 26M-token dataset of 2,281 ethically sourced, human-to-human dialogues. Moreover, we formalize information elicitation as a finite-horizon POMDP and propose novel metrics tailored to IEAs. Pilot experiments on multiple foundation LLMs show that training on YIELD improves their alignment with real elicitation behavior and findings are corroborated by human evaluation. We release YIELD under CC BY 4.0. The dataset, project code, evaluation tools, and fine-tuned model adapters are available at: https://github.com/infosenselab/yield.

View on arXiv PDF Code

Similar