AIGTLGJan 5, 2017

Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making

arXiv:1701.01302v312 citations
AI Analysis

This addresses a foundational gap in MORL for cooperative decision-making among players with different beliefs, representing a novel theoretical advancement rather than an incremental improvement.

The paper tackles the problem of multi-objective reinforcement learning (MORL) for players with differing beliefs and utility functions, deriving a recursion for Pareto optimal policies that requires shifting priorities over time based on belief accuracy, which diverges from naive linear aggregation methods.

Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine's policy will prioritize each player's interests over time. Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player's own beliefs in evaluating how well an action will serve that player's utility function, and (2) shift the relative priority it assigns to each player's expected utilities over time, by a factor proportional to how well that player's beliefs predict the machine's inputs. Observation (2) represents a substantial divergence from naïve linear utility aggregation (as in Harsanyi's utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes