CL AINov 25, 2025

BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali

arXiv:2511.20399v22 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of evaluating LLM robustness in low-resource cultural contexts for NLP researchers, though it is incremental as it builds on existing multilingual benchmark efforts.

The authors tackled the lack of evaluation for figurative and culturally grounded reasoning in low-resource languages by creating BengaliFig, a dataset of 435 Bengali riddles, and found that frontier LLMs show consistent weaknesses in metaphorical and culturally specific reasoning under zero-shot and few-shot prompting.

Large language models excel on broad multilingual benchmarks but remain to be evaluated extensively in figurative and culturally grounded reasoning, especially in low-resource contexts. We present BengaliFig, a compact yet richly annotated challenge set that targets this gap in Bengali, a widely spoken low-resourced language. The dataset contains 435 unique riddles drawn from Bengali oral and literary traditions. Each item is annotated along five orthogonal dimensions capturing reasoning type, trap type, cultural depth, answer category, and difficulty, and is automatically converted to multiple-choice format through a constraint-aware, AI-assisted pipeline. We evaluate eight frontier LLMs from major providers under zero-shot and few-shot chain-of-thought prompting, revealing consistent weaknesses in metaphorical and culturally specific reasoning. BengaliFig thus contributes both a diagnostic probe for evaluating LLM robustness in low-resource cultural contexts and a step toward inclusive and heritage-aware NLP evaluation.

View on arXiv PDF

Similar