CLJun 5, 2021

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Peng Xu, Feijun Jiang, Yuxiang Hu, Chen Shi, Pascale Fung

arXiv:2106.02787v17.668 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of developing robust end-to-end dialogue systems for multilingual regions, though it is incremental as it extends existing datasets to a bilingual format.

The authors tackled the lack of multilingual datasets for task-oriented dialogue modeling by introducing BiToD, a bilingual multi-domain dataset with over 7k dialogues and 144k utterances, which serves as a benchmark and shows effectiveness in bilingual training and cross-lingual transfer.

Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure progress and develop better conversational agents. However, existing datasets for end-to-end ToD modeling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions. Here we introduce BiToD, the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base. It serves as an effective benchmark for evaluating bilingual ToD systems and cross-lingual transfer learning approaches. We provide state-of-the-art baselines under three evaluation settings (monolingual, bilingual, and cross-lingual). The analysis of our baselines in different settings highlights 1) the effectiveness of training a bilingual ToD system compared to two independent monolingual ToD systems, and 2) the potential of leveraging a bilingual knowledge base and cross-lingual transfer learning to improve the system performance under low resource condition.

View on arXiv PDF Code

Similar