CLAug 24, 2024

Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation

arXiv:2409.00071v12 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This is an incremental step for low-resource language translation, addressing data scarcity issues.

The paper tackles the problem of low-resource neural machine translation by proposing a generative-adversarial network for data augmentation, showing potential in generating monolingual sentences from under 20,000 sentences.

Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.

View on arXiv PDF

Similar