ML LGDec 2, 2022

Covariance Estimators for the ROOT-SGD Algorithm in Online Learning

arXiv:2212.01259v110.85 citationsh-index: 28

Originality Incremental advance

AI Analysis

This work addresses a specific gap in online learning for statistical inference, providing tools for uncertainty quantification in ROOT-SGD, but it is incremental as it builds on an existing algorithm.

The paper tackles the problem of unknown asymptotic covariance in the ROOT-SGD algorithm, which limits uncertainty measurement, by developing two covariance estimators: a plug-in estimator with O(1/√t) convergence and a Hessian-free estimator for cases where Hessian information is unavailable.

Online learning naturally arises in many statistical and machine learning problems. The most widely used methods in online learning are stochastic first-order algorithms. Among this family of algorithms, there is a recently developed algorithm, Recursive One-Over-T SGD (ROOT-SGD). ROOT-SGD is advantageous in that it converges at a non-asymptotically fast rate, and its estimator further converges to a normal distribution. However, this normal distribution has unknown asymptotic covariance; thus cannot be directly applied to measure the uncertainty. To fill this gap, we develop two estimators for the asymptotic covariance of ROOT-SGD. Our covariance estimators are useful for statistical inference in ROOT-SGD. Our first estimator adopts the idea of plug-in. For each unknown component in the formula of the asymptotic covariance, we substitute it with its empirical counterpart. The plug-in estimator converges at the rate $\mathcal{O}(1/\sqrt{t})$, where $t$ is the sample size. Despite its quick convergence, the plug-in estimator has the limitation that it relies on the Hessian of the loss function, which might be unavailable in some cases. Our second estimator is a Hessian-free estimator that overcomes the aforementioned limitation. The Hessian-free estimator uses the random-scaling technique, and we show that it is an asymptotically consistent estimator of the true covariance.

View on arXiv PDF

Similar