Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization
This addresses the challenge of noisy gradient estimates in stochastic optimization for machine learning practitioners, offering an incremental improvement by extending existing methods like SGD with momentum and RMSProp.
The paper tackles the problem of gradient variance in stochastic optimization by introducing Kalman Gradient Descent, which uses Kalman filtering to adaptively reduce variance, resulting in improved performance demonstrated through theoretical convergence analysis and experiments in areas like neural networks and variational inference.
We introduce Kalman Gradient Descent, a stochastic optimization algorithm that uses Kalman filtering to adaptively reduce gradient variance in stochastic gradient descent by filtering the gradient estimates. We present both a theoretical analysis of convergence in a non-convex setting and experimental results which demonstrate improved performance on a variety of machine learning areas including neural networks and black box variational inference. We also present a distributed version of our algorithm that enables large-dimensional optimization, and we extend our algorithm to SGD with momentum and RMSProp.