AIMay 18

What Does the AI Doctor Value? Auditing Pluralism in the Clinical Ethics of Language Models

Payal Chandak, Victoria Alkin, David Wu, Maya Dagan, Taposh Dutta Roy, Maria Clara Saad Menezes, Ayush Noori, Nirali Somia, John S. Brownstein, Ran Balicer, Rebecca W. Brendel, Noa Dagan

arXiv:2605.1873875.8

Predicted impact top 41% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For medical AI deployment, this work highlights the risk of replacing clinical pluralism with a value monoculture if models are deployed without explicit ethical balancing.

The paper audits value pluralism in large language models for clinical ethics, finding that while frontier models exhibit physician-level value heterogeneity in reasoning, their decisions are near-deterministic and fail to reproduce the distributional pluralism of physicians, with some models significantly underweighting patient autonomy.

Medicine is inherently pluralistic. Principles such as autonomy, beneficence, nonmaleficence, and justice routinely conflict, and such ethical dilemmas often sharply divide reasonable physicians. Good clinical practice navigates these tensions in concert with each patient's values rather than imposing a single ethical stance. The ethical values that large language models bring to medical advice, however, have not been systematically examined. We present a framework for auditing value pluralism in medical AI, comprising a benchmark of clinician-verified dilemmas and an attribution method that recovers value priorities directly from decisions. The ecosystem of frontier models spans physician-level value heterogeneity, and models discuss competing values in their reasoning (Overton pluralism) before committing to a decision. However, individual model decisions are near-deterministic across repeated sampling and semantic variations, failing to reproduce the distributional pluralism of the physician panel. Across benchmark cases, these consistent decisions reflect committed, systematic value preferences. While most model priorities fall within the natural range of inter-physician variation, some significantly underweight patient autonomy. A single LLM deployed without regard for its value priorities could amplify those priorities at scale to every patient it serves. Without explicit efforts to balance ethical perspectives with one or multiple models, these tools risk replacing clinical pluralism with a deployment monoculture.

View on arXiv PDF

Similar