CL AIApr 9, 2023

FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud

arXiv:2304.04280v129.6296 citationsh-index: 49Has Code

Originality Synthesis-oriented

AI Analysis

This provides a new benchmark for French medical NLP, addressing a gap in publicly available resources, though it is incremental as it adapts existing MCQA formats to a specific language and domain.

The authors introduced FrenchMedMCQA, the first French medical multiple-choice question answering dataset with 3,105 questions from real pharmacy exams, and established baseline models showing that English specialized models outperformed generic French ones despite the dataset being in French.

This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy, mixing single and multiple answers. Each instance of the dataset contains an identifier, a question, five possible answers and their manual correction(s). We also propose first baseline models to automatically process this MCQA task in order to report on the current performances and to highlight the difficulty of the task. A detailed analysis of the results showed that it is necessary to have representations adapted to the medical domain or to the MCQA task: in our case, English specialized models yielded better results than generic French ones, even though FrenchMedMCQA is in French. Corpus, models and tools are available online.

View on arXiv PDF Code

Similar