MLLGJun 8, 2017

Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

arXiv:1706.02524v255 citations
AI Analysis

This work addresses the problem of scaling automated statistical modeling for researchers and practitioners dealing with large datasets, representing an incremental improvement over the original Automatic Statistician.

The paper tackles the scalability issue of the Automatic Statistician, which uses Gaussian Processes for automated statistical modeling but suffers from O(N^3) runtime, by proposing Scalable Kernel Composition (SKC) to extend it to larger datasets, showing that a derived upper bound on the marginal likelihood is significantly tighter than the variational lower bound for model selection.

Automating statistical modelling is a challenging problem in artificial intelligence. The Automatic Statistician takes a first step in this direction, by employing a kernel search algorithm with Gaussian Processes (GP) to provide interpretable statistical models for regression problems. However this does not scale due to its $O(N^3)$ running time for the model selection. We propose Scalable Kernel Composition (SKC), a scalable kernel search algorithm that extends the Automatic Statistician to bigger data sets. In doing so, we derive a cheap upper bound on the GP marginal likelihood that sandwiches the marginal likelihood with the variational lower bound . We show that the upper bound is significantly tighter than the lower bound and thus useful for model selection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes