BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets
This addresses the need for large-scale, automatically curated datasets to train chatbots that blend multiple skills, though it is incremental in automating dataset creation.
The paper tackles the problem of building open-domain chatbots with diverse communicative skills by proposing BotsTalk, a framework using multiple agents to automatically annotate multi-skill dialogues, resulting in the creation of BSBT, a dataset of 300K conversations that proves effective for multi-skill dialogue systems.
To build open-domain chatbots that are able to use diverse communicative skills, we propose a novel framework BotsTalk, where multiple agents grounded to the specific target skills participate in a conversation to automatically annotate multi-skill dialogues. We further present Blended Skill BotsTalk (BSBT), a large-scale multi-skill dialogue dataset comprising 300K conversations. Through extensive experiments, we demonstrate that our dataset can be effective for multi-skill dialogue systems which require an understanding of skill blending as well as skill grounding. Our code and data are available at https://github.com/convei-lab/BotsTalk.