CL AIJul 29, 2025

IndoPref: A Multi-Domain Pairwise Preference Dataset for Indonesian

Vanessa Rebecca Wiyono, David Anugraha, Ayu Purwarianti, Genta Indra Winata

arXiv:2507.22159v22 citationsh-index: 42IJCNLP-AACL

Originality Synthesis-oriented

AI Analysis

This provides a culturally authentic benchmark for evaluating LLMs in Indonesian, addressing a gap for over 200 million speakers, though it is incremental as it applies existing methods to a new language.

The authors tackled the underrepresentation of Indonesian in preference-based LLM research by creating IndoPref, a fully human-authored, multi-domain dataset with 4,099 pairwise preferences across 522 prompts, achieving strong inter-annotator agreement.

Over 200 million people speak Indonesian, yet the language remains significantly underrepresented in preference-based research for large language models (LLMs). Most existing multilingual datasets are derived from English translations, often resulting in content that lacks cultural and linguistic authenticity. To address this gap, we introduce IndoPref, the first fully human-authored and multi-domain Indonesian preference dataset designed to evaluate the naturalness and quality of LLM-generated text. The dataset contains 522 prompts and yields 4,099 human-annotated pairwise preferences from comparisons across five instruction-tuned LLMs. All annotations are natively written in Indonesian with strong inter-annotator agreement, measured by Krippendorff's alpha. Our benchmark spans 10 diverse categories, enabling practitioners to identify LLMs' fine-grained strengths and weaknesses.

View on arXiv PDF

Similar