Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
This work addresses the challenge of representing inherent uncertainty in dialogues for applications like negotiation systems, though it is incremental as it builds on existing forecasting tasks with new metrics and fine-tuning methods.
The paper tackled the problem of forecasting uncertainty in conversations by expanding the conversation forecasting task to include uncertainty-aware metrics, enabling abstention on uncertain instances. The result showed that fine-tuning strategies could calibrate smaller open-source models to compete with pre-trained models 10 times their size, as demonstrated on eight negotiation corpora.
Effective interlocutors account for the uncertain goals, beliefs, and emotions of others. But even the best human conversationalist cannot perfectly anticipate the trajectory of a dialogue. How well can language models represent inherent uncertainty in conversations? We propose FortUne Dial, an expansion of the long-standing "conversation forecasting" task: instead of just accuracy, evaluation is conducted with uncertainty-aware metrics, effectively enabling abstention on individual instances. We study two ways in which language models potentially represent outcome uncertainty (internally, using scores and directly, using tokens) and propose fine-tuning strategies to improve calibration of both representations. Experiments on eight difficult negotiation corpora demonstrate that our proposed fine-tuning strategies (a traditional supervision strategy and an off-policy reinforcement learning strategy) can calibrate smaller open-source models to compete with pre-trained models 10x their size.