CLAIMay 24, 2020

GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

arXiv:2005.11729v225 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of creating more effective and autonomous chatbots for specific domains like financial anti-fraud, though it is incremental as it builds on existing hierarchical reinforcement learning techniques.

The authors tackled the problem of building goal-oriented chatbots that rely less on hand-crafted rules or labeled data by proposing GoChat, a framework using hierarchical reinforcement learning for end-to-end training; it outperformed previous methods in response quality and goal success rate on a financial anti-fraud dataset.

A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets to reach the goals. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training chatbots to maximize the longterm return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy guides the conversation towards the final goal by determining some sub-goals, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes