CLMar 15, 2025

TLUE: A Tibetan Language Understanding Evaluation Benchmark

arXiv:2503.12051v58 citationsh-index: 17EMNLP
Originality Synthesis-oriented
AI Analysis

This addresses the problem of inclusivity for Tibetan speakers by providing a crucial evaluation tool, though it is incremental as it applies existing benchmark methods to a new language.

The authors tackled the underrepresentation of Tibetan in large language model evaluation by creating TLUE, the first large-scale Tibetan language understanding benchmark, and found that most state-of-the-art models perform below random baseline, indicating significant challenges in processing Tibetan.

Large language models have made tremendous progress in recent years, but low-resource languages, like Tibetan, remain significantly underrepresented in their evaluation. Despite Tibetan being spoken by over seven million people, it has largely been neglected in the development and assessment of large language models. To address this gap, we present a \textbf{T}ibetan \textbf{L}anguage \textbf{U}nderstanding \textbf{E}valuation Benchmark, \textbf{TLUE}, the first large-scale benchmark for measuring the proficiency of LLMs in the Tibetan language. \textbf{TLUE} comprises two major components: a comprehensive multi-task understanding benchmark spanning 5 domains and 67 subdomains, and a safety benchmark encompassing 7 subdomains. Then, we evaluate a diverse set of state-of-the-art large language models. Experimental results demonstrate that most large language models perform below the random baseline, highlighting the considerable challenges they face in Tibetan language processing. \textbf{TLUE} provides a crucial foundation for advancing future research in Tibetan language understanding and highlights the importance of promoting greater inclusivity in the development of large language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes