RO LGJun 29, 2024

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jagersand, A. Rupam Mahmood

arXiv:2407.00324v213.017 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient robot learning for tasks like pick-and-place, offering a practical alternative to dense rewards, though it is incremental as it revisits and validates an existing sparse reward approach.

The paper tackles the problem of goal-reaching reinforcement learning by comparing sparse (minimum-time) and dense reward paradigms, finding that sparse rewards can lead to higher-quality policies and outperform dense-reward policies on their own metrics, with experiments showing pixel-based policies learned from scratch in 2-3 hours on real robots.

Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible. These problems, when formulated as episodic reinforcement learning tasks, can easily be specified to align well with our intended goal: -1 reward every time step with termination upon reaching the goal state, called minimum-time tasks. Despite this simplicity, such formulations are often overlooked in favor of dense rewards due to their perceived difficulty and lack of informativeness. Our studies contrast the two reward paradigms, revealing that the minimum-time task specification not only facilitates learning higher-quality policies but can also surpass dense-reward-based policies on their own performance metrics. Crucially, we also identify the goal-hit rate of the initial policy as a robust early indicator for learning success in such sparse feedback settings. Finally, using four distinct real-robotic platforms, we show that it is possible to learn pixel-based policies from scratch within two to three hours using constant negative rewards.

View on arXiv PDF Code

Similar