Asymptotic Analysis of Conditioned Stochastic Gradient Descent
This provides theoretical guarantees for a broad class of preconditioned SGD methods, which is incremental but important for optimization in machine learning.
The paper tackles the problem of analyzing the asymptotic behavior of Conditioned Stochastic Gradient Descent algorithms, establishing weak convergence and asymptotic normality under mild assumptions, with the result that using an inverse Hessian estimate makes the algorithm asymptotically optimal.
In this paper, we investigate a general class of stochastic gradient descent (SGD) algorithms, called Conditioned SGD, based on a preconditioning of the gradient direction. Using a discrete-time approach with martingale tools, we establish under mild assumptions the weak convergence of the rescaled sequence of iterates for a broad class of conditioning matrices including stochastic first-order and second-order methods. Almost sure convergence results, which may be of independent interest, are also presented. Interestingly, the asymptotic normality result consists in a stochastic equicontinuity property so when the conditioning matrix is an estimate of the inverse Hessian, the algorithm is asymptotically optimal.