CLAIApr 16, 2024

Question Difficulty Ranking for Multiple-Choice Reading Comprehension

arXiv:2404.10704v19 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses the need for scalable and cost-effective question difficulty ranking for English test creators, though it is incremental as it builds on existing transfer and prompting techniques.

The paper tackled the problem of automatically ranking multiple-choice reading comprehension questions by difficulty to reduce reliance on expensive human pretesting, finding that zero-shot comparative assessment with language models achieved a Spearman's correlation of 40.4% and combining methods improved performance.

Multiple-choice (MC) tests are an efficient method to assess English learners. It is useful for test creators to rank candidate MC questions by difficulty during exam curation. Typically, the difficulty is determined by having human test takers trial the questions in a pretesting stage. However, this is expensive and not scalable. Therefore, we explore automated approaches to rank MC questions by difficulty. However, there is limited data for explicit training of a system for difficulty scores. Hence, we compare task transfer and zero-shot approaches: task transfer adapts level classification and reading comprehension systems for difficulty ranking while zero-shot prompting of instruction finetuned language models contrasts absolute assessment against comparative. It is found that level classification transfers better than reading comprehension. Additionally, zero-shot comparative assessment is more effective at difficulty ranking than the absolute assessment and even the task transfer approaches at question difficulty ranking with a Spearman's correlation of 40.4%. Combining the systems is observed to further boost the correlation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes