CLAug 17, 2025

LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages

arXiv:2508.12459v18.33 citationsh-index: 36EMNLP

Originality Synthesis-oriented

AI Analysis

This addresses the problem of benchmarking low-resource Indonesian languages for NLP researchers, though it is incremental as it builds on existing multilingual benchmark efforts.

The paper tackled the lack of NLP progress for Indonesia's diverse languages by introducing LoraxBench, a benchmark covering 20 Indonesian languages across 6 tasks, and found it challenging with performance discrepancies between Indonesian and low-resource languages, and register changes affecting model performance.

As one of the world's most populous countries, with 700 languages spoken, Indonesia is behind in terms of NLP progress. We introduce LoraxBench, a benchmark that focuses on low-resource languages of Indonesia and covers 6 diverse tasks: reading comprehension, open-domain QA, language inference, causal reasoning, translation, and cultural QA. Our dataset covers 20 languages, with the addition of two formality registers for three languages. We evaluate a diverse set of multilingual and region-focused LLMs and found that this benchmark is challenging. We note a visible discrepancy between performance in Indonesian and other languages, especially the low-resource ones. There is no clear lead when using a region-specific model as opposed to the general multilingual model. Lastly, we show that a change in register affects model performance, especially with registers not commonly found in social media, such as high-level politeness `Krama' Javanese.

View on arXiv PDF

Similar