LGFeb 2, 2024

To the Max: Reinventing Reward in Reinforcement Learning

arXiv:2402.01361v214 citationsh-index: 5Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient learning due to poor reward design for RL practitioners, though it appears incremental as it builds on existing reward optimization concepts.

The paper tackles the challenge of reward function selection in reinforcement learning by introducing max-reward RL, which optimizes maximum rather than cumulative reward, and demonstrates its benefits over standard RL in goal-reaching environments.

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce \textit{max-reward RL}, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes