CL AISep 18, 2018

Talking to myself: self-dialogues as data for conversational agents

Joachim Fainberg, Ben Krause, Mihai Dobre, Marco Damonte, Emmanuel Kahembwe, Daniel Duma, Bonnie Webber, Federico Fancellu

arXiv:1809.06641v22.314 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the data scarcity problem for conversational AI developers, though it is incremental as it builds on existing data collection methods.

The paper tackles the challenge of limited training data for conversational agents by introducing a novel method using crowd-sourced self-dialogues, resulting in a corpus of 3.6 million words across 23 topics.

Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongside this paper, we include a corpus of 3.6 million words across 23 topics. We argue the utility of the corpus by comparing self-dialogues with standard two-party conversations as well as data from other corpora.

View on arXiv PDF Code

Similar