LGMLSep 15, 2019

Biased Estimates of Advantages over Path Ensembles

arXiv:1909.06851v1
Originality Incremental advance
AI Analysis

This work addresses a key challenge in reinforcement learning for improving algorithm efficiency and performance, though it is incremental as it builds on existing estimation methods.

The paper tackles the problem of advantage estimation in reinforcement learning by proposing biased estimates based on order statistics over path ensembles, which leads to more efficient exploration and substantial performance gains across various benchmarks.

The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path ensemble, which allows one to flexibly drive the learning process, towards or against risks. On top of this formulation, we systematically study the impacts of different methods for estimating advantages. Our findings reveal that biased estimates, when chosen appropriately, can result in significant benefits. In particular, for the environments with sparse rewards, optimistic estimates would lead to more efficient exploration of the policy space; while for those where individual actions can have critical impacts, conservative estimates are preferable. On various benchmarks, including MuJoCo continuous control, Terrain locomotion, Atari games, and sparse-reward environments, the proposed biased estimation schemes consistently demonstrate improvement over mainstream methods, not only accelerating the learning process but also obtaining substantial performance gains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes