LGAIROSYMLSep 18, 2020

GRAC: Self-Guided and Self-Regularized Actor-Critic

arXiv:2009.08973v231 citations
AI Analysis

This addresses the problem of slow learning due to target networks in deep reinforcement learning for decision-making and control tasks, representing a novel method rather than an incremental improvement.

The paper tackled the issue of target networks slowing down learning in deep reinforcement learning by introducing a self-regularized TD-learning method to prevent divergence without target networks, and a self-guided policy improvement method combining policy-gradient with zero-order optimization for robust action search, achieving or outperforming state-of-the-art results on OpenAI gym tasks.

Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main contribution in this work is a self-regularized TD-learning method to address divergence without requiring a target network. Additionally, we propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization to search for actions associated with higher Q-values in a broad neighborhood. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. Taken together, these components define GRAC, a novel self-guided and self-regularized actor critic algorithm. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes