Subgoal Discovery for Hierarchical Dialogue Policy Learning
This addresses the problem of efficient policy learning for complex goal-oriented dialogue agents, offering an incremental improvement by automating subgoal discovery.
The paper tackles the challenge of sparse learning signals in long goal-oriented dialogues by proposing a divide-and-conquer approach that discovers subgoals from successful dialogues and uses them for hierarchical reinforcement learning. Experiments on travel planning show the method performs competitively against a state-of-the-art approach requiring human-defined subgoals.
Developing agents to engage in complex goal-oriented dialogues is challenging partly because the main learning signals are very sparse in long conversations. In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. First, given successful example dialogues, we propose the Subgoal Discovery Network (SDN) to divide a complex goal-oriented task into a set of simpler subgoals in an unsupervised fashion. We then use these subgoals to learn a multi-level policy by hierarchical reinforcement learning. We demonstrate our method by building a dialogue agent for the composite task of travel planning. Experiments with simulated and real users show that our approach performs competitively against a state-of-the-art method that requires human-defined subgoals. Moreover, we show that the learned subgoals are often human comprehensible.