CLMar 5

NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance

Abrar Eyasir, Tahsin Ahmed, Muhammad Ibrahim

arXiv:2603.05462v10.6

Originality Incremental advance

AI Analysis

This dataset and benchmark address the problem of unreliable responses to unanswerable questions in reading comprehension systems for low-resource languages like Bangla, which is an incremental improvement for NLP researchers working on these languages.

This paper introduces NCTB-QA, a large-scale Bangla question answering dataset with 87,805 question-answer pairs, specifically designed to address unanswerable questions in low-resource languages. By fine-tuning transformer-based models on this dataset, the authors achieved a 313% relative improvement in F1 score for BERT, from 0.150 to 0.620.

Reading comprehension systems for low-resource languages face significant challenges in handling unanswerable questions. These systems tend to produce unreliable responses when correct answers are absent from context. To solve this problem, we introduce NCTB-QA, a large-scale Bangla question answering dataset comprising 87,805 question-answer pairs extracted from 50 textbooks published by Bangladesh's National Curriculum and Textbook Board. Unlike existing Bangla datasets, NCTB-QA maintains a balanced distribution of answerable (57.25%) and unanswerable (42.75%) questions. NCTB-QA also includes adversarially designed instances containing plausible distractors. We benchmark three transformer-based models (BERT, RoBERTa, ELECTRA) and demonstrate substantial improvements through fine-tuning. BERT achieves 313% relative improvement in F1 score (0.150 to 0.620). Semantic answer quality measured by BERTScore also increases significantly across all models. Our results establish NCTB-QA as a challenging benchmark for Bangla educational question answering. This study demonstrates that domain-specific fine-tuning is critical for robust performance in low-resource settings.

View on arXiv PDF

Similar