Adaptive Dialog Policy Learning with Hindsight and User Modeling
This work addresses efficiency in dialog systems for users and developers, though it appears incremental as it builds on existing reinforcement learning methods with specific enhancements.
The paper tackled the problem of inefficient dialog policy learning due to high interaction costs and poor user experience by developing the LHUA algorithm, which adaptively learns from both simulated and real users using hindsight and user modeling, resulting in improved success rates and policy quality compared to baselines.
Reinforcement learning methods have been used to compute dialog policies from language-based interaction experiences. Efficiency is of particular importance in dialog policy learning, because of the considerable cost of interacting with people, and the very poor user experience from low-quality conversations. Aiming at improving the efficiency of dialog policy learning, we develop algorithm LHUA (Learning with Hindsight, User modeling, and Adaptation) that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users. Simulation and hindsight provide the dialog agent with more experience and more (positive) reinforcements respectively. Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature, including its no-simulation, no-adaptation, and no-hindsight counterparts.