LGAIOct 5, 2021

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

arXiv:2110.02034v2163 citations
Originality Incremental advance
AI Analysis

This addresses computational bottlenecks for researchers and practitioners using ensemble-based RL methods, though it is incremental as it builds directly on REDQ.

The paper tackled the computational inefficiency of REDQ, a state-of-the-art sample-efficient reinforcement learning method, by proposing DroQ, which uses dropout Q-functions to achieve comparable sample efficiency with REDQ and much better computational efficiency, making it doubly efficient.

Randomized ensembled double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is made possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al., 2018a). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called DroQ, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that DroQ is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ, much better computational efficiency than REDQ, and comparable computational efficiency with that of SAC.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes