AIJun 20, 2025

Resource Rational Contractualism Should Guide AI Alignment

MIT
arXiv:2506.17434v13 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the AI alignment problem for systems operating in complex human environments, though it appears incremental as it builds on existing contractualist approaches.

The paper tackles the challenge of aligning AI systems with diverse human values by proposing Resource-Rational Contractualism (RRC), a framework that uses normatively-grounded heuristics to efficiently approximate stakeholder agreements, enabling dynamic adaptation to human social environments.

AI systems will soon have to navigate human environments and make decisions that affect people and other AI agents whose goals and values diverge. Contractualist alignment proposes grounding those decisions in agreements that diverse stakeholders would endorse under the right conditions, yet securing such agreement at scale remains costly and slow -- even for advanced AI. We therefore propose Resource-Rational Contractualism (RRC): a framework where AI systems approximate the agreements rational parties would form by drawing on a toolbox of normatively-grounded, cognitively-inspired heuristics that trade effort for accuracy. An RRC-aligned agent would not only operate efficiently, but also be equipped to dynamically adapt to and interpret the ever-changing human social world.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes