CLJun 5, 2021

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

arXiv:2106.02787v168 citations
Originality Incremental advance
AI Analysis

This addresses the problem of developing robust end-to-end dialogue systems for multilingual regions, though it is incremental as it extends existing datasets to a bilingual format.

The authors tackled the lack of multilingual datasets for task-oriented dialogue modeling by introducing BiToD, a bilingual multi-domain dataset with over 7k dialogues and 144k utterances, which serves as a benchmark and shows effectiveness in bilingual training and cross-lingual transfer.

Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure progress and develop better conversational agents. However, existing datasets for end-to-end ToD modeling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions. Here we introduce BiToD, the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base. It serves as an effective benchmark for evaluating bilingual ToD systems and cross-lingual transfer learning approaches. We provide state-of-the-art baselines under three evaluation settings (monolingual, bilingual, and cross-lingual). The analysis of our baselines in different settings highlights 1) the effectiveness of training a bilingual ToD system compared to two independent monolingual ToD systems, and 2) the potential of leveraging a bilingual knowledge base and cross-lingual transfer learning to improve the system performance under low resource condition.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes