CL AIFeb 19, 2025

Batayan: A Filipino NLP benchmark for evaluating Large Language Models

Jann Railey Montalan, Jimson Paulo Layacan, David Demitri Africa, Richell Isaiah Flores, Michael T. Lopez, Theresa Denise Magsajo, Anjanette Cayabyab, William Chandra Tjhi

arXiv:2502.14911v25 citationsh-index: 12Has CodeACL

Originality Synthesis-oriented

AI Analysis

This addresses the problem of under-representation in NLP for Filipino speakers, though it is incremental as it builds on existing benchmark methodologies for a new language.

The authors tackled the lack of evaluation benchmarks for under-resourced languages by introducing Batayan, a Filipino NLP benchmark for LLMs, which revealed significant performance gaps across models, with some tasks showing accuracy drops of over 20% compared to high-resource languages.

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities on widely benchmarked high-resource languages. However, linguistic nuances of under-resourced languages remain unexplored. We introduce Batayan, a holistic Filipino benchmark that systematically evaluates LLMs across three key natural language processing (NLP) competencies: understanding, reasoning, and generation. Batayan consolidates eight tasks, three of which have not existed prior for Filipino corpora, covering both Tagalog and code-switched Taglish utterances. Our rigorous, native-speaker-driven adaptation and validation processes ensures fluency and authenticity to the complex morphological and syntactic structures of Filipino, alleviating the pervasive translationese bias in existing Filipino corpora. We report empirical results on a variety of open-source and commercial LLMs, highlighting significant performance gaps that signal the under-representation of Filipino in pre-training corpora, the unique hurdles in modeling Filipino's rich morphology and construction, and the importance of explicit Filipino language support. Moreover, we discuss the practical challenges encountered in dataset construction and propose principled solutions for building culturally and linguistically-faithful resources in under-represented languages. We also provide a public evaluation suite as a clear foundation for iterative, community-driven progress in Filipino NLP.

View on arXiv PDF

Similar