LG AINov 5, 2023

Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return

arXiv:2311.02544v46.65 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses multi-objective decision-making under nonlinear preferences, which is incremental as it builds on existing MOMDP frameworks with a new algorithmic approach.

The paper tackles the problem of maximizing expected scalarized return in multi-objective reinforcement learning with nonlinear preferences, deriving an extended Bellman optimality formulation and proposing an approximation algorithm that computes an approximately optimal policy in pseudopolynomial time, with experimental results showing a substantial gap over baselines.

We study multi-objective reinforcement learning with nonlinear preferences over trajectories. That is, we maximize the expected value of a nonlinear function over accumulated rewards (expected scalarized return or ESR) in a multi-objective Markov Decision Process (MOMDP). We derive an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. Using this formulation, we describe an approximation algorithm for computing an approximately optimal non-stationary policy in pseudopolynomial time for smooth scalarization functions with a constant number of rewards. We prove the approximation analytically and demonstrate the algorithm experimentally, showing that there can be a substantial gap between the optimal policy computed by our algorithm and alternative baselines.

View on arXiv PDF

Similar