Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert
This work addresses the need for stable and efficient statistical inference in machine learning, particularly for smooth convex functions, by extending ISGD methods beyond generalized linear model assumptions, though it is incremental in building on existing ISGD literature.
The paper tackles the problem of statistical inference for model parameters using implicit stochastic gradient descent (ISGD) by analyzing proximal Robbins-Monro and proximal Polyak-Ruppert procedures, deriving non-asymptotic error bounds and limiting distributions, and proposing online estimators for asymptotic covariance matrices that enable valid confidence intervals.
The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive non-asymptotic point estimation error bounds of both proxRM and proxPR iterates and their limiting distributions, and propose on-line estimators of their asymptotic covariance matrices that require only a single run of ISGD. The latter estimators are used to construct valid confidence intervals for the model parameters. Our analysis is free of the generalized linear model assumption that has limited the preceding analyses, and employs feasible procedures. Our on-line covariance matrix estimators appear to be the first of this kind in the ISGD literature.