LGJul 30, 2023

Variance Control for Distributional Reinforcement Learning

Qi Kuang, Zhoufan Zhu, Liwen Zhang, Fan Zhou

arXiv:2307.16152v17.74 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This addresses a fundamental validation gap in distributional RL for researchers, though it appears incremental as it builds on existing DRL frameworks.

The paper tackles the problem of approximation errors in distributional reinforcement learning by theoretically analyzing bias and variance, then proposes a new estimator (Quantiled Expansion Mean) and algorithm (QEMRL) that show significant improvements in sample efficiency and convergence on Atari and Mujoco benchmarks.

Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting. To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator \emph{Quantiled Expansion Mean} (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective. We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks and demonstrate that QEMRL achieves significant improvement over baseline algorithms in terms of sample efficiency and convergence performance.

View on arXiv PDF Code

Similar