CLSep 2, 2024

DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models

arXiv:2409.01497v21 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This addresses concerns about demographic bias in LLMs for healthcare, providing a resource for evaluation and mitigation, though it is incremental as it builds on existing datasets and methods.

The authors tackled the problem of demographic bias in large language models (LLMs) for medical diagnosis by creating DiversityMedQA, a benchmark that perturbs medical questions to assess performance across different patient demographics. Their results show notable discrepancies in model performance when tested against these variations.

As large language models (LLMs) gain traction in healthcare, concerns about their susceptibility to demographic biases are growing. We introduce {DiversityMedQA}, a novel benchmark designed to assess LLM responses to medical queries across diverse patient demographics, such as gender and ethnicity. By perturbing questions from the MedQA dataset, which comprises medical board exam questions, we created a benchmark that captures the nuanced differences in medical diagnosis across varying patient profiles. Our findings reveal notable discrepancies in model performance when tested against these demographic variations. Furthermore, to ensure the perturbations were accurate, we also propose a filtering strategy that validates each perturbation. By releasing DiversityMedQA, we provide a resource for evaluating and mitigating demographic bias in LLM medical diagnoses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes