LGFeb 5, 2024

Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

arXiv:2402.02665v14 citationsh-index: 28AAMAS
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of unifying and enhancing reinforcement learning approaches for researchers and practitioners, but it appears incremental as it builds on existing utility-based methods.

The paper extends the utility-based paradigm from multi-objective reinforcement learning to single-objective reinforcement learning, enabling benefits such as multi-policy learning for uncertain objectives, risk-aware RL, discounting, and safe RL.

Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes