LGAIJun 30, 2023

Resetting the Optimizer in Deep RL: An Empirical Study

arXiv:2306.17833v235 citationsh-index: 27
Originality Synthesis-oriented
AI Analysis

This addresses a specific optimization issue in deep RL, offering a simple fix for practitioners, though it is incremental in nature.

The paper tackled the problem of optimizer contamination in deep reinforcement learning due to changing loss landscapes, and found that resetting the optimizer's internal parameters significantly improved performance on the Atari benchmark.

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common approach to solving this sequence of problems is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first-order and the second-order moments of the gradient, and update them over time. Therefore, information obtained in previous iterations is used to solve the optimization problem in the current iteration. We demonstrate that this can contaminate the moment estimates because the optimization landscape can change arbitrarily from one iteration to the next one. To hedge against this negative effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting idea by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes