LGAIMLJun 19, 2020

NROWAN-DQN: A Stable Noisy Network with Noise Reduction and Online Weight Adjustment for Exploration

arXiv:2006.10980v139 citations
Originality Incremental advance
AI Analysis

This work addresses stability issues in noisy networks for reinforcement learning agents, particularly in action-sensitive environments, though it appears incremental as it builds directly on NoisyNets.

The paper tackles the problem of improving exploration stability in deep reinforcement learning by proposing NROWAN-DQN, which introduces noise reduction and online weight adjustment to NoisyNet-DQN, resulting in higher scores and significantly reduced variance in performance across four standard domains.

Deep reinforcement learning has been applied more and more widely nowadays, especially in various complex control tasks. Effective exploration for noisy networks is one of the most important issues in deep reinforcement learning. Noisy networks tend to produce stable outputs for agents. However, this tendency is not always enough to find a stable policy for an agent, which decreases efficiency and stability during the learning process. Based on NoisyNets, this paper proposes an algorithm called NROWAN-DQN, i.e., Noise Reduction and Online Weight Adjustment NoisyNet-DQN. Firstly, we develop a novel noise reduction method for NoisyNet-DQN to make the agent perform stable actions. Secondly, we design an online weight adjustment strategy for noise reduction, which improves stable performance and gets higher scores for the agent. Finally, we evaluate this algorithm in four standard domains and analyze properties of hyper-parameters. Our results show that NROWAN-DQN outperforms prior algorithms in all these domains. In addition, NROWAN-DQN also shows better stability. The variance of the NROWAN-DQN score is significantly reduced, especially in some action-sensitive environments. This means that in some environments where high stability is required, NROWAN-DQN will be more appropriate than NoisyNets-DQN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes