ML LGDec 20, 2014

Competing with the Empirical Risk Minimizer in a Single Pass

Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

arXiv:1412.6606v2104 citations

Originality Incremental advance

AI Analysis

This work addresses computational efficiency challenges for practitioners in machine learning and statistics who need scalable, low-resource estimation methods, though it is incremental as it builds on existing ERM frameworks.

The paper tackles the problem of minimizing computational resources like time and space while matching the statistical performance of the empirical risk minimizer (ERM) in estimation tasks such as linear and logistic regression. It presents a streaming algorithm that achieves the same statistical convergence rate as the ERM with linear time and space usage in a single pass, and quantifies the finite-sample rate at which it becomes competitive.

In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible. In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or the $M$-estimator -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties. Our goal in this work is to perform as well as the ERM, on every problem, while minimizing the use of computational resources such as running time and space usage. We provide a simple streaming algorithm which, under standard regularity assumptions on the underlying problem, enjoys the following properties: * The algorithm can be implemented in linear time with a single pass of the observed data, using space linear in the size of a single sample. * The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer on every problem, even considering constant factors. * The algorithm's performance depends on the initial error at a rate that decreases super-polynomially. * The algorithm is easily parallelizable. Moreover, we quantify the (finite-sample) rate at which the algorithm becomes competitive with the ERM.

View on arXiv PDF

Similar