CLAILGMar 27, 2022

MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

arXiv:2203.14371v1708 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

It provides a new benchmark for evaluating AI models on real-world medical domain tasks, addressing a gap in high-quality, diverse medical QA data.

The paper introduces MedMCQA, a large-scale multiple-choice question answering dataset with over 194k medical exam questions covering 2.4k topics and 21 subjects, designed to test deep language understanding and reasoning abilities.

This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS \& NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects \& topics. A detailed explanation of the solution, along with the above information, is provided in this study.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes