CLAIOct 16, 2025

FarsiMCQGen: a Persian Multiple-choice Question Generation Framework

arXiv:2510.15134v1h-index: 22
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of educational testing efficiency for Persian-language learners, though it is incremental as it adapts existing methods to a new language and dataset.

The paper tackles the challenge of generating high-quality multiple-choice questions (MCQs) in Persian, a low-resource language, by introducing FarsiMCQGen, a framework that combines candidate generation, filtering, and ranking techniques, and it also presents a novel Persian MCQ dataset of 10,289 questions evaluated by state-of-the-art large language models.

Multiple-choice questions (MCQs) are commonly used in educational testing, as they offer an efficient means of evaluating learners' knowledge. However, generating high-quality MCQs, particularly in low-resource languages such as Persian, remains a significant challenge. This paper introduces FarsiMCQGen, an innovative approach for generating Persian-language MCQs. Our methodology combines candidate generation, filtering, and ranking techniques to build a model that generates answer choices resembling those in real MCQs. We leverage advanced methods, including Transformers and knowledge graphs, integrated with rule-based approaches to craft credible distractors that challenge test-takers. Our work is based on data from Wikipedia, which includes general knowledge questions. Furthermore, this study introduces a novel Persian MCQ dataset comprising 10,289 questions. This dataset is evaluated by different state-of-the-art large language models (LLMs). Our results demonstrate the effectiveness of our model and the quality of the generated dataset, which has the potential to inspire further research on MCQs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes