CVMay 9

KEPIL: Knowledge-Enhanced Prompt-Image Learning for Prompt-Robust Disease Detection

Haozhe Luo, Shelley Zixin Shu, Ziyu Zhou, Robert Berke, Mauricio Reyes

arXiv:2605.0913263.7Has Code

AI Analysis

For clinicians and radiologists, KEPIL improves the reliability of vision-language models for long-tailed disease detection by reducing sensitivity to prompt variations.

KEPIL integrates curated medical knowledge to stabilize zero-shot disease detection in radiology, achieving state-of-the-art performance with a 4.11% average AUC improvement under prompt variation.

Vision--language models (VLMs) show promise for clinical decision support in radiology because they enable joint reasoning over radiological images and clinical text, thereby leveraging complementary clinical information. However, radiological findings are long-tailed in practice, leaving some conditions underrepresented and making zero-shot inference essential. Yet current CLIP-style medical VLMs are sensitive to prompt variations and often lack trustworthy external knowledge at inference time, which hinders reliable clinical deployment. We present \textit{KEPIL}, a prompt-robust framework that integrates curated medical knowledge to stabilize zero-shot generalization. KEPIL comprises: (i) \emph{dynamic prompt enrichment} using ontologies with LLM assistance, (ii) a \emph{semantic-aware contrastive loss} aligning embeddings of equivalent prompt variants via a dual-embedding objective, and (iii) \emph{entity-centric report standardization} to yield ontology-aligned representations. Across seven benchmarks, KEPIL achieves state-of-the-art zero-shot inference performance; under prompt-variation tests, it improves AUC by \(6.37\%\) on \textit{CheXpert} and by \(4.11\%\) on average. These results suggest that structured knowledge and robust prompt design are key to clinically reliable radiology-facing VLMs. Code will be released at https://github.com/Roypic/KEPIL.

View on arXiv PDF Code

Similar