CLAIOct 23, 2022

BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets

arXiv:2210.12687v1297 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This addresses the need for large-scale, automatically curated datasets to train chatbots that blend multiple skills, though it is incremental in automating dataset creation.

The paper tackles the problem of building open-domain chatbots with diverse communicative skills by proposing BotsTalk, a framework using multiple agents to automatically annotate multi-skill dialogues, resulting in the creation of BSBT, a dataset of 300K conversations that proves effective for multi-skill dialogue systems.

To build open-domain chatbots that are able to use diverse communicative skills, we propose a novel framework BotsTalk, where multiple agents grounded to the specific target skills participate in a conversation to automatically annotate multi-skill dialogues. We further present Blended Skill BotsTalk (BSBT), a large-scale multi-skill dialogue dataset comprising 300K conversations. Through extensive experiments, we demonstrate that our dataset can be effective for multi-skill dialogue systems which require an understanding of skill blending as well as skill grounding. Our code and data are available at https://github.com/convei-lab/BotsTalk.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes