LGFeb 5, 2024

Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

Peter Vamplew, Cameron Foale, Conor F. Hayes, Patrick Mannion, Enda Howley, Richard Dazeley, Scott Johnson, Johan Källström, Gabriel Ramos, Roxana Rădulescu, Willem Röpke, Diederik M. Roijers

arXiv:2402.02665v110.44 citationsh-index: 28AAMAS

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of unifying and enhancing reinforcement learning approaches for researchers and practitioners, but it appears incremental as it builds on existing utility-based methods.

The paper extends the utility-based paradigm from multi-objective reinforcement learning to single-objective reinforcement learning, enabling benefits such as multi-policy learning for uncertain objectives, risk-aware RL, discounting, and safe RL.

Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.

View on arXiv PDF

Similar