CL AIApr 10, 2023

DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach

arXiv:2304.04881v10.94 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses the need for reliable evaluation in educational assessment tools, though it is incremental as it builds on existing distractor generation models.

The authors tackled the problem of evaluating generated distractors for multiple-choice questions, proposing DISTO as a learned metric that correlates highly with human ratings and reveals flaws in using machine translation metrics for this task.

Multiple choice questions (MCQs) are an efficient and common way to assess reading comprehension (RC). Every MCQ needs a set of distractor answers that are incorrect, but plausible enough to test student knowledge. Distractor generation (DG) models have been proposed, and their performance is typically evaluated using machine translation (MT) metrics. However, MT metrics often misjudge the suitability of generated distractors. We propose DISTO: the first learned evaluation metric for generated distractors. We validate DISTO by showing its scores correlate highly with human ratings of distractor quality. At the same time, DISTO ranks the performance of state-of-the-art DG models very differently from MT-based metrics, showing that MT metrics should not be used for distractor evaluation.

View on arXiv PDF

Similar