AIDec 30, 2023

Principal-Agent Reward Shaping in MDPs

arXiv:2401.00298v121 citationsh-index: 10AAAI
Originality Incremental advance
AI Analysis

This addresses incentive alignment in multi-agent systems, offering computational solutions for specific cases, but it is incremental as it builds on existing economic and MDP frameworks.

The paper tackles the problem of reward shaping in principal-agent Markov Decision Processes (MDPs) under budget constraints, establishing that the problem is NP-hard and providing polynomial approximation algorithms for stochastic trees and deterministic finite-horizon decision processes.

Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios such as Markov Decision Processes (MDPs). In this paper, we further explore this line of research by investigating how reward shaping under budget constraints can improve the principal's utility. We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. The principal offers an additional reward to the agent, and the agent picks their policy selfishly to maximize their reward, which is the sum of the original and the offered reward. Our results establish the NP-hardness of the problem and offer polynomial approximation algorithms for two classes of instances: Stochastic trees and deterministic decision processes with a finite horizon.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes