CL AINov 2, 2023

Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia

Lucky Susanto, Ryandito Diandaru, Adila Krisnadhi, Ayu Purwarianti, Derry Wijaya

arXiv:2311.00998v120.5124 citationsh-index: 21Has Code

Originality Synthesis-oriented

AI Analysis

It addresses benchmarking and data challenges for low-resource languages in Indonesia, offering practical guidance for researchers in similar contexts, though it is incremental in nature.

This paper tackled the problem of neural machine translation for low-resource local languages in Indonesia by analyzing training approaches for four languages, revealing that some systems achieve competitive performance rivaling zero-shot GPT-3.5-turbo.

Neural machine translation (NMT) for low-resource local languages in Indonesia faces significant challenges, including the need for a representative benchmark and limited data availability. This work addresses these challenges by comprehensively analyzing training NMT systems for four low-resource local languages in Indonesia: Javanese, Sundanese, Minangkabau, and Balinese. Our study encompasses various training approaches, paradigms, data sizes, and a preliminary study into using large language models for synthetic low-resource languages parallel data generation. We reveal specific trends and insights into practical strategies for low-resource language translation. Our research demonstrates that despite limited computational resources and textual data, several of our NMT systems achieve competitive performances, rivaling the translation quality of zero-shot gpt-3.5-turbo. These findings significantly advance NMT for low-resource languages, offering valuable guidance for researchers in similar contexts.

View on arXiv PDF Code

Similar