LG AI MLMay 8, 2020

Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov

arXiv:2005.04269v130.5274 citations

Originality Highly original

AI Analysis

This addresses a key problem in reinforcement learning for continuous control, offering a novel solution with significant performance gains.

The paper tackles overestimation bias in off-policy learning for continuous control by proposing Truncated Quantile Critics (TQC), which combines distributional representation, truncation, and ensembling, resulting in a 25% improvement on the Humanoid environment and outperforming state-of-the-art methods across all benchmarks.

The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.

View on arXiv PDF

Similar