FarsiMCQGen: a Persian Multiple-choice Question Generation Framework
This work addresses the problem of educational testing efficiency for Persian-language learners, though it is incremental as it adapts existing methods to a new language and dataset.
The paper tackles the challenge of generating high-quality multiple-choice questions (MCQs) in Persian, a low-resource language, by introducing FarsiMCQGen, a framework that combines candidate generation, filtering, and ranking techniques, and it also presents a novel Persian MCQ dataset of 10,289 questions evaluated by state-of-the-art large language models.
Multiple-choice questions (MCQs) are commonly used in educational testing, as they offer an efficient means of evaluating learners' knowledge. However, generating high-quality MCQs, particularly in low-resource languages such as Persian, remains a significant challenge. This paper introduces FarsiMCQGen, an innovative approach for generating Persian-language MCQs. Our methodology combines candidate generation, filtering, and ranking techniques to build a model that generates answer choices resembling those in real MCQs. We leverage advanced methods, including Transformers and knowledge graphs, integrated with rule-based approaches to craft credible distractors that challenge test-takers. Our work is based on data from Wikipedia, which includes general knowledge questions. Furthermore, this study introduces a novel Persian MCQ dataset comprising 10,289 questions. This dataset is evaluated by different state-of-the-art large language models (LLMs). Our results demonstrate the effectiveness of our model and the quality of the generated dataset, which has the potential to inspire further research on MCQs.