AI CLFeb 5

ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs

Rohan Subramanian Thomas, Shikhar Shiromani, Abdullah Chaudhry, Ruizhe Li, Vasu Sharma, Kevin Zhu, Sunishchal Dev

arXiv:2602.13274v14.41 citationsh-index: 10

Originality Highly original

AI Analysis

This work provides a standardized framework for evaluating and improving the moral reasoning and safety alignment of large language models, which is crucial for developers and users deploying these models in sensitive applications.

This paper introduces ProMoral-Bench, a unified benchmark that evaluates 11 prompting strategies across four LLM families using various datasets. The study found that compact, exemplar-guided prompts achieved higher Unified Moral Safety Scores (UMSS) and greater robustness at a lower token cost compared to complex multi-stage reasoning.

Prompt design significantly impacts the moral competence and safety alignment of large language models (LLMs), yet empirical comparisons remain fragmented across datasets and models.We introduce ProMoral-Bench, a unified benchmark evaluating 11 prompting paradigms across four LLM families. Using ETHICS, Scruples, WildJailbreak, and our new robustness test, ETHICS-Contrast, we measure performance via our proposed Unified Moral Safety Score (UMSS), a metric balancing accuracy and safety. Our results show that compact, exemplar-guided scaffolds outperform complex multi-stage reasoning, providing higher UMSS scores and greater robustness at a lower token cost. While multi-turn reasoning proves fragile under perturbations, few-shot exemplars consistently enhance moral stability and jailbreak resistance. ProMoral-Bench establishes a standardized framework for principled, cost-effective prompt engineering.

View on arXiv PDF

Similar