CLMay 24, 2025

Multilingual Question Answering in Low-Resource Settings: A Dzongkha-English Benchmark for Foundation Models

arXiv:2505.18638v25 citationsh-index: 11Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of evaluating and improving LLM performance for Bhutanese students in Dzongkha, a low-resource language, though it is incremental as it primarily provides a new benchmark.

The authors tackled the problem of multilingual question answering in low-resource settings by creating DZEN, a parallel Dzongkha-English dataset with over 5K questions, and found that Large Language Models (LLMs) show significant performance differences between English and Dzongkha, with Chain-of-Thought prompting improving reasoning questions and English translations enhancing Dzongkha response precision.

In this work, we provide DZEN, a dataset of parallel Dzongkha and English test questions for Bhutanese middle and high school students. The over 5K questions in our collection span a variety of scientific topics and include factual, application, and reasoning-based questions. We use our parallel dataset to test a number of Large Language Models (LLMs) and find a significant performance difference between the models in English and Dzongkha. We also look at different prompting strategies and discover that Chain-of-Thought (CoT) prompting works well for reasoning questions but less well for factual ones. We also find that adding English translations enhances the precision of Dzongkha question responses. Our results point to exciting avenues for further study to improve LLM performance in Dzongkha and, more generally, in low-resource languages. We release the dataset at: https://github.com/kraritt/llm_dzongkha_evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes