Approximate Newton-based statistical inference using only stochastic gradients
This provides a scalable inference framework for large-scale machine learning practitioners, though it builds incrementally on stochastic Newton and finite-difference concepts.
The paper tackles the problem of statistical inference in convex empirical risk minimization by developing an approximate stochastic Newton method that computes statistical error covariance without exact second-order information or full data resampling. The method achieves efficient computation for M-estimation and LASSO regression, and demonstrates practical effectiveness on large-scale problems including adversarial attack detection in neural networks.
We present a novel statistical inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessian-vector product from first-order information. In theory, our method efficiently computes the statistical error covariance in $M$-estimation, both for unregularized convex learning problems and high-dimensional LASSO regression, without using exact second order information, or resampling the entire data set. We also present a stochastic gradient sampling scheme for statistical inference in non-i.i.d. time series analysis, where we sample contiguous blocks of indices. In practice, we demonstrate the effectiveness of our framework on large-scale machine learning problems, that go even beyond convexity: as a highlight, our work can be used to detect certain adversarial attacks on neural networks.