MLJan 9, 2015

Survey schemes for stochastic gradient descent with applications to M-estimation

Stéphan Clémençon, Patrice Bertail, Emilie Chautru, Guillaume Papa

arXiv:1501.02218v11.53 citations

Originality Incremental advance

AI Analysis

This work addresses large-scale statistical and machine-learning problems in the Big Data era, offering an incremental improvement by optimizing survey schemes for stochastic gradient descent.

The paper tackles the challenge of computing statistics on massive datasets by investigating the impact of survey sampling with unequal inclusion probabilities on stochastic gradient descent-based M-estimation methods. It proves that using appropriate first-order inclusion probabilities can significantly increase asymptotic accuracy without affecting complexity, as supported by limit theorems and numerical experiments.

In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full sample is hardly feasible, if not unfeasible. A natural approach in this context consists in using survey schemes and substituting the "full data" statistics with their counterparts based on the resulting random samples, of manageable size. It is the main purpose of this paper to investigate the impact of survey sampling with unequal inclusion probabilities on stochastic gradient descent-based M-estimation methods in large-scale statistical and machine-learning problems. Precisely, we prove that, in presence of some a priori information, one may significantly increase asymptotic accuracy when choosing appropriate first order inclusion probabilities, without affecting complexity. These striking results are described here by limit theorems and are also illustrated by numerical experiments.

View on arXiv PDF

Similar