CLAIDec 2, 2018

A Study on Dialogue Reward Prediction for Open-Ended Conversational Agents

arXiv:1812.00350v16 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of optimizing context windows for dialogue reward prediction in conversational agents, but it is incremental as it focuses on a specific task without broader SOTA claims.

The study investigated how much dialogue history is needed for conversational agents to reliably predict dialogue rewards, finding that lengthy histories of at least 10 sentences (with 25 being optimal) improve prediction accuracy with strong positive correlations.

The amount of dialogue history to include in a conversational agent is often underestimated and/or set in an empirical and thus possibly naive way. This suggests that principled investigations into optimal context windows are urgently needed given that the amount of dialogue history and corresponding representations can play an important role in the overall performance of a conversational system. This paper studies the amount of history required by conversational agents for reliably predicting dialogue rewards. The task of dialogue reward prediction is chosen for investigating the effects of varying amounts of dialogue history and their impact on system performance. Experimental results using a dataset of 18K human-human dialogues report that lengthy dialogue histories of at least 10 sentences are preferred (25 sentences being the best in our experiments) over short ones, and that lengthy histories are useful for training dialogue reward predictors with strong positive correlations between target dialogue rewards and predicted ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes