CLNov 7, 2024

ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding

arXiv:2411.05049v317 citationsh-index: 39Has CodeNAACL
Originality Synthesis-oriented
AI Analysis

This addresses the problem of reliable LLM evaluation for low-resource languages, which is incremental as it builds on existing benchmarking efforts.

The paper tackles the challenge of evaluating LLMs for low-resource language understanding by introducing ProverbEval, a benchmark focusing on culture-specific scenarios, and finds performance variances of up to 50% due to factors like answer choice order and prompt language.

With the rapid development of evaluation datasets to assess LLMs understanding across a wide range of subjects and domains, identifying a suitable language understanding benchmark has become increasingly challenging. In this work, we explore LLM evaluation challenges for low-resource language understanding and introduce \proverbeval, LLM evaluation benchmark for low-resource languages, focusing on low-resource language understanding in culture-specific scenarios. We benchmark various LLMs and explore factors that create variability in the benchmarking process. We observed performance variances of up to 50\%, depending on the order in which answer choices were presented in multiple-choice tasks. Native language proverb descriptions significantly improve tasks such as proverb generation, contributing to improved outcomes. Additionally, monolingual evaluations consistently outperformed their cross-lingual counterparts in generation tasks. We argue that special attention must be given to the order of choices, the choice of prompt language, task variability, and generation tasks when creating LLM evaluation benchmarks. Evaluation data available at https://huggingface.co/datasets/israel/ProverbEval, evaluation code https://github.com/EthioNLP/EthioProverbEval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes