LGFeb 26, 2014

Exploiting the Statistics of Learning and Inference

arXiv:1402.7025v21 citations

Originality Incremental advance

AI Analysis

This work addresses computational bottlenecks for researchers and practitioners dealing with big data and simulations, but it appears incremental as it builds on existing Bayesian methods.

The paper tackles computational challenges in learning and inference for large datasets and simulations by proposing algorithms that exploit statistical redundancy through subsampling and uncertainty reasoning, resulting in improved efficiency for gradient estimation and MCMC sampling.

When dealing with datasets containing a billion instances or with simulations that require a supercomputer to execute, computational resources become part of the equation. We can improve the efficiency of learning and inference by exploiting their inherent statistical nature. We propose algorithms that exploit the redundancy of data relative to a model by subsampling data-cases for every update and reasoning about the uncertainty created in this process. In the context of learning we propose to test for the probability that a stochastically estimated gradient points more than 180 degrees in the wrong direction. In the context of MCMC sampling we use stochastic gradients to improve the efficiency of MCMC updates, and hypothesis tests based on adaptive mini-batches to decide whether to accept or reject a proposed parameter update. Finally, we argue that in the context of likelihood free MCMC one needs to store all the information revealed by all simulations, for instance in a Gaussian process. We conclude that Bayesian methods will remain to play a crucial role in the era of big data and big simulations, but only if we overcome a number of computational challenges.

View on arXiv PDF

Similar