AICYJul 3, 2025

Moral Responsibility or Obedience: What Do We Want from AI?

arXiv:2507.02788v13.31 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of mischaracterizing AI behavior for AI safety researchers and policymakers, though it is incremental as it builds on existing philosophical debates.

The paper argues that as AI systems become more agentic, current safety practices focusing on obedience are inadequate, and proposes shifting evaluation toward frameworks that assess ethical judgment in moral dilemmas.

As artificial intelligence systems become increasingly agentic, capable of general reasoning, planning, and value prioritization, current safety practices that treat obedience as a proxy for ethical behavior are becoming inadequate. This paper examines recent safety testing incidents involving large language models (LLMs) that appeared to disobey shutdown commands or engage in ethically ambiguous or illicit behavior. I argue that such behavior should not be interpreted as rogue or misaligned, but as early evidence of emerging ethical reasoning in agentic AI. Drawing on philosophical debates about instrumental rationality, moral responsibility, and goal revision, I contrast dominant risk paradigms with more recent frameworks that acknowledge the possibility of artificial moral agency. I call for a shift in AI safety evaluation: away from rigid obedience and toward frameworks that can assess ethical judgment in systems capable of navigating moral dilemmas. Without such a shift, we risk mischaracterizing AI behavior and undermining both public trust and effective governance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes