CLMay 15, 2025

What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs

Xinlan Yan, Di Wu, Yibin Lei, Christof Monz, Iacer Calixto

arXiv:2505.10113v31 citationsh-index: 18

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of optimizing data usage for fine-tuning medical LLMs, but it is incremental as it rethinks existing fine-tuning approaches without proposing a new method.

The paper tackled the role of clinical specialty data in medical question-answering by introducing S-MedQA, a dataset with over 20k examples across 15 specialties, and found that training on specialty-specific data does not necessarily improve performance on that specialty, with gains likely from domain shifting rather than specialty knowledge.

In this paper, we introduce S-MedQA, an English medical question-answering (QA) dataset for benchmarking large language models (LLMs) in fine-grained clinical specialties. S-MedQA has over 20k examples, covers 15 medical specialties, and QA pairs can have multiple specialty annotations (e.g., when a question is cross-disciplinary), constructed with both machine and expert verification to maximize data availability. We use S-MedQA to investigate the role of clinical specialty data in the knowledge-intensive scenario of medical QA. Our results show that 1) training on data from a clinical specialty does not necessarily lead to best performance on that specialty, and 2) regardless of the specialty the LLM was fine-tuned on, token probabilities of clinically relevant terms increase consistently across all specialties. Thus, we hypothesize improvement gains are derived mostly from domain shifting (e.g., general to medical) rather than specialty-specific knowledge injection, and suggest rethinking the role of fine-tuning data in the medical domain.

View on arXiv PDF

Similar