CLAILGApr 20, 2024

UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions

arXiv:2404.13343v127 citationsh-index: 8Has CodeBEA
Originality Synthesis-oriented
AI Analysis

This work addresses automated assessment in medical licensing exams, but it is incremental as it applies existing LLM and transformer methods to a specific dataset.

The authors tackled the problem of predicting item difficulty and response time for USMLE multiple-choice questions by using LLMs for data augmentation and transformer models, finding that difficulty prediction was more challenging and that including question text and LLM answer variability improved results.

This work explores a novel data augmentation method based on Large Language Models (LLMs) for predicting item difficulty and response time of retired USMLE Multiple-Choice Questions (MCQs) in the BEA 2024 Shared Task. Our approach is based on augmenting the dataset with answers from zero-shot LLMs (Falcon, Meditron, Mistral) and employing transformer-based models based on six alternative feature combinations. The results suggest that predicting the difficulty of questions is more challenging. Notably, our top performing methods consistently include the question text, and benefit from the variability of LLM answers, highlighting the potential of LLMs for improving automated assessment in medical licensing exams. We make our code available https://github.com/ana-rogoz/BEA-2024.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes