Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning
This work addresses the problem of unifying and enhancing reinforcement learning approaches for researchers and practitioners, but it appears incremental as it builds on existing utility-based methods.
The paper extends the utility-based paradigm from multi-objective reinforcement learning to single-objective reinforcement learning, enabling benefits such as multi-policy learning for uncertain objectives, risk-aware RL, discounting, and safe RL.
Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.