COMLSep 26, 2017

PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference

arXiv:1709.09216v335 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable and theoretically sound Bayesian inference for GLMs, which is incremental as it builds on existing methods but offers new guarantees and efficiency.

The authors tackled the scalability and theoretical guarantee issues in Bayesian generalized linear model inference by proposing PASS-GLM, a method based on polynomial approximate sufficient statistics, which demonstrated competitive performance on a large dataset with 40 million data points and 20,000 covariates.

Generalized linear models (GLMs) -- such as logistic regression, Poisson regression, and robust regression -- provide interpretable models for diverse data types. Probabilistic approaches, particularly Bayesian ones, allow coherent estimates of uncertainty, incorporation of prior information, and sharing of power across experiments via hierarchical models. In practice, however, the approximate Bayesian methods necessary for inference have either failed to scale to large data sets or failed to provide theoretical guarantees on the quality of inference. We propose a new approach based on constructing polynomial approximate sufficient statistics for GLMs (PASS-GLM). We demonstrate that our method admits a simple algorithm as well as trivial streaming and distributed extensions that do not compound error across computations. We provide theoretical guarantees on the quality of point (MAP) estimates, the approximate posterior, and posterior mean and uncertainty estimates. We validate our approach empirically in the case of logistic regression using a quadratic approximation and show competitive performance with stochastic gradient descent, MCMC, and the Laplace approximation in terms of speed and multiple measures of accuracy -- including on an advertising data set with 40 million data points and 20,000 covariates.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes