CLMay 13, 2025

NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context

Ben Yao, Qiuchi Li, Yazhou Zhang, Siyu Yang, Bohan Zhang, Prayag Tiwari, Jing Qin

arXiv:2505.08734v1h-index: 9Has Code

Originality Incremental advance

AI Analysis

It addresses the problem of ensuring ethical alignment in AI for clinical nursing, providing a foundational tool for developers, though it is incremental as it builds on existing value alignment benchmarks.

This work introduces the first benchmark for evaluating nursing value alignment in large language models, based on five core nursing values and 1,100 real-world instances, and finds that models like DeepSeek-V3 and Claude 3.5 Sonnet achieve high performance, with Justice being the most challenging dimension.

This work introduces the first benchmark for nursing value alignment, consisting of five core value dimensions distilled from international nursing codes: Altruism, Human Dignity, Integrity, Justice, and Professionalism. The benchmark comprises 1,100 real-world nursing behavior instances collected through a five-month longitudinal field study across three hospitals of varying tiers. These instances are annotated by five clinical nurses and then augmented with LLM-generated counterfactuals with reversed ethic polarity. Each original case is paired with a value-aligned and a value-violating version, resulting in 2,200 labeled instances that constitute the Easy-Level dataset. To increase adversarial complexity, each instance is further transformed into a dialogue-based format that embeds contextual cues and subtle misleading signals, yielding a Hard-Level dataset. We evaluate 23 state-of-the-art (SoTA) LLMs on their alignment with nursing values. Our findings reveal three key insights: (1) DeepSeek-V3 achieves the highest performance on the Easy-Level dataset (94.55), where Claude 3.5 Sonnet outperforms other models on the Hard-Level dataset (89.43), significantly surpassing the medical LLMs; (2) Justice is consistently the most difficult nursing value dimension to evaluate; and (3) in-context learning significantly improves alignment. This work aims to provide a foundation for value-sensitive LLMs development in clinical settings. The dataset and the code are available at https://huggingface.co/datasets/Ben012345/NurValues.

View on arXiv PDF

Similar