AIAug 18, 2025

Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants

arXiv:2508.12754v12 citationsh-index: 2Has CodeECAI
Originality Incremental advance
AI Analysis

This work addresses the need for deeper evaluation of LLMs' moral capabilities beyond superficial alignment, which is important for AI ethics and safety researchers, though it is incremental in connecting philosophy with AI evaluation.

The paper tackles the problem of evaluating large language models (LLMs) as artificial moral assistants by developing a new benchmark based on a formal framework for moral reasoning, revealing considerable variability and persistent shortcomings in models, particularly in abductive reasoning.

The recent rise in popularity of large language models (LLMs) has prompted considerable concerns about their moral capabilities. Although considerable effort has been dedicated to aligning LLMs with human moral values, existing benchmarks and evaluations remain largely superficial, typically measuring alignment based on final ethical verdicts rather than explicit moral reasoning. In response, this paper aims to advance the investigation of LLMs' moral capabilities by examining their capacity to function as Artificial Moral Assistants (AMAs), systems envisioned in the philosophical literature to support human moral deliberation. We assert that qualifying as an AMA requires more than what state-of-the-art alignment techniques aim to achieve: not only must AMAs be able to discern ethically problematic situations, they should also be able to actively reason about them, navigating between conflicting values outside of those embedded in the alignment phase. Building on existing philosophical literature, we begin by designing a new formal framework of the specific kind of behaviour an AMA should exhibit, individuating key qualities such as deductive and abductive moral reasoning. Drawing on this theoretical framework, we develop a benchmark to test these qualities and evaluate popular open LLMs against it. Our results reveal considerable variability across models and highlight persistent shortcomings, particularly regarding abductive moral reasoning. Our work connects theoretical philosophy with practical AI evaluation while also emphasising the need for dedicated strategies to explicitly enhance moral reasoning capabilities in LLMs. Code available at https://github.com/alessioGalatolo/AMAeval

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes