LGCVMLDec 22, 2020

Stochastic Gradient Variance Reduction by Solving a Filtering Problem

arXiv:2012.12418v20.104 citationsHas Code
AI Analysis55

This work addresses the problem of high gradient variance in stochastic gradient descent for deep neural network optimization, which is a common challenge for practitioners.

This paper introduces Filter Gradient Descent (FGD), an optimization algorithm that reduces stochastic gradient variance by treating gradient estimation as an adaptive filtering problem. FGD leverages historical states to improve current gradient estimates, leading to more accurate gradient directions and accelerated convergence in deep neural network training.

Deep neural networks (DNN) are typically optimized using stochastic gradient descent (SGD). However, the estimation of the gradient using stochastic samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence. In this paper, we propose \textbf{Filter Gradient Decent}~(FGD), an efficient stochastic optimization algorithm that makes the consistent estimation of the local gradient by solving an adaptive filtering problem with different design of filters. Our method reduces variance in stochastic gradient descent by incorporating the historical states to enhance the current estimation. It is able to correct noisy gradient direction as well as to accelerate the convergence of learning. We demonstrate the effectiveness of the proposed Filter Gradient Descent on numerical optimization and training neural networks, where it achieves superior and robust performance compared with traditional momentum-based methods. To the best of our knowledge, we are the first to provide a practical solution that integrates filtering into gradient estimation by making the analogy between gradient estimation and filtering problems in signal processing. (The code is provided in https://github.com/Adamdad/Filter-Gradient-Decent)

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes