CLSep 10, 2025

BRoverbs -- Measuring how much LLMs understand Portuguese proverbs

Thales Sales Almeida, Giovana Kerche Bonás, João Guilherme Alves Santos

arXiv:2509.08960v18.33 citationsh-index: 4Has CodeJ Braz Comput Soc

Originality Synthesis-oriented

AI Analysis

This addresses the problem of limited Portuguese-language evaluation for LLM researchers, though it is incremental as it focuses on a specific cultural domain.

The paper tackles the lack of mature evaluation frameworks for LLMs in Portuguese by introducing BRoverbs, a dataset using Brazilian proverbs to assess performance, resulting in a new benchmark tool for regionally informed evaluation.

Large Language Models (LLMs) exhibit significant performance variations depending on the linguistic and cultural context in which they are applied. This disparity signals the necessity of mature evaluation frameworks that can assess their capabilities in specific regional settings. In the case of Portuguese, existing evaluations remain limited, often relying on translated datasets that may not fully capture linguistic nuances or cultural references. Meanwhile, native Portuguese-language datasets predominantly focus on structured national exams or sentiment analysis of social media interactions, leaving gaps in evaluating broader linguistic understanding. To address this limitation, we introduce BRoverbs, a dataset specifically designed to assess LLM performance through Brazilian proverbs. Proverbs serve as a rich linguistic resource, encapsulating cultural wisdom, figurative expressions, and complex syntactic structures that challenge the model comprehension of regional expressions. BRoverbs aims to provide a new evaluation tool for Portuguese-language LLMs, contributing to advancing regionally informed benchmarking. The benchmark is available at https://huggingface.co/datasets/Tropic-AI/BRoverbs.

View on arXiv PDF

Similar