CLAug 27, 2022

MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

Qingyu Zhang, Xiaoyu Shen, Ernie Chang, Jidong Ge, Pengke Chen

arXiv:2208.13078v13.217 citationsh-index: 25Has Code

Originality Synthesis-oriented

AI Analysis

This provides a new benchmark for multilingual dialogue generation, addressing a gap for low-resource languages, though it is incremental as it builds on existing models and datasets.

The authors tackled the lack of multilingual dialogue datasets by introducing mDIA, a benchmark covering 46 languages, and found that mT5-based models outperform DialoGPT on some metrics but show a large quality gap between English and other languages.

Owing to the lack of corpora for low-resource languages, current works on dialogue generation have mainly focused on English. In this paper, we present mDIA, the first large-scale multilingual benchmark for dialogue generation across low- to high-resource languages. It covers real-life conversations in 46 languages across 19 language families. We present baseline results obtained by fine-tuning the multilingual, non-dialogue-focused pre-trained model mT5 as well as English-centric, dialogue-focused pre-trained chatbot DialoGPT. The results show that mT5-based models perform better on sacreBLEU and BertScore but worse on diversity. Even though promising results are found in few-shot and zero-shot scenarios, there is a large gap between the generation quality in English and other languages. We hope that the release of mDIA could encourage more works on multilingual dialogue generation to promote language diversity.

View on arXiv PDF Code

Similar