CL AIMay 24, 2020

GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

arXiv:2005.11729v21.325 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of creating more effective and autonomous chatbots for specific domains like financial anti-fraud, though it is incremental as it builds on existing hierarchical reinforcement learning techniques.

The authors tackled the problem of building goal-oriented chatbots that rely less on hand-crafted rules or labeled data by proposing GoChat, a framework using hierarchical reinforcement learning for end-to-end training; it outperformed previous methods in response quality and goal success rate on a financial anti-fraud dataset.

A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets to reach the goals. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training chatbots to maximize the longterm return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy guides the conversation towards the final goal by determining some sub-goals, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.

View on arXiv PDF

Similar