AIGTDec 17, 2025

A Decision-Theoretic Approach for Managing Misalignment

arXiv:2512.15584v2
Originality Incremental advance
AI Analysis

This provides a principled method for managing AI delegation risks, shifting focus from perfect alignment to practical trade-offs, which is incremental but addresses a key gap in value alignment literature.

The paper tackles the problem of when to delegate decisions to AI systems under uncertainty, introducing a decision-theoretic framework that shows context-specific delegation can be optimal even with significant misalignment, while universal delegation requires near-perfect alignment.

When should we delegate decisions to AI systems? While the value alignment literature has developed techniques for shaping AI values, less attention has been paid to how to determine, under uncertainty, when imperfect alignment is good enough to justify delegation. We argue that rational delegation requires balancing an agent's value (mis)alignment with its epistemic accuracy and its reach (the acts it has available). This paper introduces a formal, decision-theoretic framework to analyze this tradeoff precisely accounting for a principal's uncertainty about these factors. Our analysis reveals a sharp distinction between two delegation scenarios. First, universal delegation (trusting an agent with any problem) demands near-perfect value alignment and total epistemic trust, conditions rarely met in practice. Second, we show that context-specific delegation can be optimal even with significant misalignment. An agent's superior accuracy or expanded reach may grant access to better overall decision problems, making delegation rational in expectation. We develop a novel scoring framework to quantify this ex ante decision. Ultimately, our work provides a principled method for determining when an AI is aligned enough for a given context, shifting the focus from achieving perfect alignment to managing the risks and rewards of delegation under uncertainty.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes