LGJul 30, 2023

Variance Control for Distributional Reinforcement Learning

arXiv:2307.16152v14 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses a fundamental validation gap in distributional RL for researchers, though it appears incremental as it builds on existing DRL frameworks.

The paper tackles the problem of approximation errors in distributional reinforcement learning by theoretically analyzing bias and variance, then proposes a new estimator (Quantiled Expansion Mean) and algorithm (QEMRL) that show significant improvements in sample efficiency and convergence on Atari and Mujoco benchmarks.

Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting. To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator \emph{Quantiled Expansion Mean} (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective. We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks and demonstrate that QEMRL achieves significant improvement over baseline algorithms in terms of sample efficiency and convergence performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes