CLAug 12, 2025

Reveal-Bangla: A Dataset for Cross-Lingual Multi-Step Reasoning Evaluation

Khondoker Ittehadul Islam, Gabriele Sarti

arXiv:2508.08933v24.91 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This addresses the problem of cross-lingual reasoning evaluation for low-resource languages like Bangla, but it is incremental as it adapts an existing dataset and method.

The authors tackled the lack of multi-step reasoning evaluation in low-resource languages by creating a manually translated Bangla dataset from the English Reveal dataset, finding that models benefit from reasoning context for non-binary questions but struggle to use Bangla reasoning steps effectively.

Language models have demonstrated remarkable performance on complex multi-step reasoning tasks. However, their evaluation has been predominantly confined to high-resource languages such as English. In this paper, we introduce a manually translated Bangla multi-step reasoning dataset derived from the English Reveal dataset, featuring both binary and non-binary question types. We conduct a controlled evaluation of English-centric and Bangla-centric multilingual small language models on the original dataset and our translated version to compare their ability to exploit relevant reasoning steps to produce correct answers. Our results show that, in comparable settings, reasoning context is beneficial for more challenging non-binary questions, but models struggle to employ relevant Bangla reasoning steps effectively. We conclude by exploring how reasoning steps contribute to models' predictions, highlighting different trends across models and languages.

View on arXiv PDF

Similar