AIOct 18, 2021

Value alignment: a formal approach

Carles Sierra, Nardine Osman, Pablo Noriega, Jordi Sabater-Mir, Antoni Perelló

arXiv:2110.09240v116.645 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of value alignment for autonomous AI systems, which is crucial for safe and ethical deployment, but the approach appears incremental as it builds on existing formal methods.

The paper tackles the problem of ensuring autonomous AI systems align with human values by proposing a formal model to represent values as preferences and compute value aggregations, defining and computing value alignment for norms based on their impact on preferences for future world states.

principles that should govern autonomous AI systems. It essentially states that a system's goals and behaviour should be aligned with human values. But how to ensure value alignment? In this paper we first provide a formal model to represent values through preferences and ways to compute value aggregations; i.e. preferences with respect to a group of agents and/or preferences with respect to sets of values. Value alignment is then defined, and computed, for a given norm with respect to a given value through the increase/decrease that it results in the preferences of future states of the world. We focus on norms as it is norms that govern behaviour, and as such, the alignment of a given system with a given value will be dictated by the norms the system follows.

View on arXiv PDF

Similar