CY AI CLFeb 21, 2025

A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare

Manar Aljohani, Jun Hou, Sindhura Kommu, Xuan Wang

arXiv:2502.15871v216.437 citationsh-index: 3Has CodeEMNLP

Originality Synthesis-oriented

AI Analysis

It tackles the problem of ensuring reliable and ethical LLM deployment in clinical settings for healthcare practitioners and researchers, but is incremental as a review paper.

This survey addresses the underexplored trustworthiness of large language models (LLMs) in healthcare by comprehensively reviewing methodologies to mitigate risks across dimensions like truthfulness and fairness, analyzing their impact on reliability and identifying critical gaps.

The application of large language models (LLMs) in healthcare holds significant promise for enhancing clinical decision-making, medical research, and patient care. However, their integration into real-world clinical settings raises critical concerns around trustworthiness, particularly around dimensions of truthfulness, privacy, safety, robustness, fairness, and explainability. These dimensions are essential for ensuring that LLMs generate reliable, unbiased, and ethically sound outputs. While researchers have recently begun developing benchmarks and evaluation frameworks to assess LLM trustworthiness, the trustworthiness of LLMs in healthcare remains underexplored, lacking a systematic review that provides a comprehensive understanding and future insights. This survey addresses that gap by providing a comprehensive review of current methodologies and solutions aimed at mitigating risks across key trust dimensions. We analyze how each dimension affects the reliability and ethical deployment of healthcare LLMs, synthesize ongoing research efforts, and identify critical gaps in existing approaches. We also identify emerging challenges posed by evolving paradigms, such as multi-agent collaboration, multi-modal reasoning, and the development of small open-source medical models. Our goal is to guide future research toward more trustworthy, transparent, and clinically viable LLMs.

View on arXiv PDF

Similar