CLMay 23, 2022

BanglaNLG and BanglaT5: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla

arXiv:2205.11081v4285 citationsh-index: 24Has Code
Originality Synthesis-oriented
AI Analysis

It addresses the problem of low-resource NLG for Bangla speakers and researchers, providing benchmarks and a model to advance future work, though it is incremental in applying existing methods to a new language.

This work tackles the lack of resources for natural language generation (NLG) in Bangla by introducing BanglaNLG, a benchmark with six tasks including a new dialogue dataset, and BanglaT5, a pretrained Transformer model that achieves state-of-the-art performance with up to 9% absolute and 32% relative gains over multilingual models.

This work presents BanglaNLG, a comprehensive benchmark for evaluating natural language generation (NLG) models in Bangla, a widely spoken yet low-resource language. We aggregate six challenging conditional text generation tasks under the BanglaNLG benchmark, introducing a new dataset on dialogue generation in the process. Furthermore, using a clean corpus of 27.5 GB of Bangla data, we pretrain BanglaT5, a sequence-to-sequence Transformer language model for Bangla. BanglaT5 achieves state-of-the-art performance in all of these tasks, outperforming several multilingual models by up to 9% absolute gain and 32% relative gain. We are making the new dialogue dataset and the BanglaT5 model publicly available at https://github.com/csebuetnlp/BanglaNLG in the hope of advancing future research on Bangla NLG.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes