AICLLGDec 18, 2025

MIMIC-RD: Can LLMs differentially diagnose rare diseases in real-world clinical settings?

arXiv:2601.11559v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the challenge of accurately diagnosing rare diseases in clinical settings, which affects patients and healthcare providers, but it is incremental as it focuses on improving evaluation methods rather than solving the diagnosis problem directly.

The authors tackled the problem of evaluating large language models (LLMs) for differential diagnosis of rare diseases by creating MIMIC-RD, a benchmark based on real-world clinical text mapped to Orphanet, and found that current state-of-the-art LLMs perform poorly on this task, highlighting a significant gap in clinical capabilities.

Despite rare diseases affecting 1 in 10 Americans, their differential diagnosis remains challenging. Due to their impressive recall abilities, large language models (LLMs) have been recently explored for differential diagnosis. Existing approaches to evaluating LLM-based rare disease diagnosis suffer from two critical limitations: they rely on idealized clinical case studies that fail to capture real-world clinical complexity, or they use ICD codes as disease labels, which significantly undercounts rare diseases since many lack direct mappings to comprehensive rare disease databases like Orphanet. To address these limitations, we explore MIMIC-RD, a rare disease differential diagnosis benchmark constructed by directly mapping clinical text entities to Orphanet. Our methodology involved an initial LLM-based mining process followed by validation from four medical annotators to confirm identified entities were genuine rare diseases. We evaluated various models on our dataset of 145 patients and found that current state-of-the-art LLMs perform poorly on rare disease differential diagnosis, highlighting the substantial gap between existing capabilities and clinical needs. From our findings, we outline several future steps towards improving differential diagnosis of rare diseases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes