LGROFeb 25, 2022

Consolidated Adaptive T-soft Update for Deep Reinforcement Learning

arXiv:2202.12504v18 citations
Originality Incremental advance
AI Analysis

This work addresses hyperparameter tuning and implementation issues in DRL for robotics, representing an incremental improvement.

The paper tackles the instability in deep reinforcement learning by proposing a consolidated adaptive T-soft (CAT-soft) update to improve noise robustness and asymptotic matching between target and main networks, verified through numerical simulations.

Demand for deep reinforcement learning (DRL) is gradually increased to enable robots to perform complex tasks, while DRL is known to be unstable. As a technique to stabilize its learning, a target network that slowly and asymptotically matches a main network is widely employed to generate stable pseudo-supervised signals. Recently, T-soft update has been proposed as a noise-robust update rule for the target network and has contributed to improving the DRL performance. However, the noise robustness of T-soft update is specified by a hyperparameter, which should be tuned for each task, and is deteriorated by a simplified implementation. This study develops adaptive T-soft (AT-soft) update by utilizing the update rule in AdaTerm, which has been developed recently. In addition, the concern that the target network does not asymptotically match the main network is mitigated by a new consolidation for bringing the main network back to the target network. This so-called consolidated AT-soft (CAT-soft) update is verified through numerical simulations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes