LGAIJun 6, 2024

Bootstrapping Expectiles in Reinforcement Learning

arXiv:2406.04081v11 citations
Originality Incremental advance
AI Analysis

This work addresses robustness and overestimation issues in reinforcement learning, offering a novel method that is competitive with state-of-the-art approaches, though it appears incremental as it builds on existing expectile concepts.

The paper tackles the overestimation problem and robustness in reinforcement learning by replacing the Bellman operator's expectation with an expectile, introducing pessimism. The proposed ExpectRL method outperforms classic twin-critic approaches in overestimation scenarios and shows improved robustness on benchmarks with environmental changes.

Many classic Reinforcement Learning (RL) algorithms rely on a Bellman operator, which involves an expectation over the next states, leading to the concept of bootstrapping. To introduce a form of pessimism, we propose to replace this expectation with an expectile. In practice, this can be very simply done by replacing the $L_2$ loss with a more general expectile loss for the critic. Introducing pessimism in RL is desirable for various reasons, such as tackling the overestimation problem (for which classic solutions are double Q-learning or the twin-critic approach of TD3) or robust RL (where transitions are adversarial). We study empirically these two cases. For the overestimation problem, we show that the proposed approach, ExpectRL, provides better results than a classic twin-critic. On robust RL benchmarks, involving changes of the environment, we show that our approach is more robust than classic RL algorithms. We also introduce a variation of ExpectRL combined with domain randomization which is competitive with state-of-the-art robust RL agents. Eventually, we also extend \ExpectRL with a mechanism for choosing automatically the expectile value, that is the degree of pessimism

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes