MLMar 19, 2017

Practical Coreset Constructions for Machine Learning

arXiv:1703.06476v2203 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental survey that compiles and organizes existing methods for building coresets to improve efficiency in machine learning tasks.

The paper provides an overview of state-of-the-art coreset constructions for machine learning, presenting a theoretical framework and applying it to k-means clustering, while summarizing existing algorithms for various problems like mixture models and regression.

We investigate coresets - succinct, small summaries of large data sets - so that solutions found on the summary are provably competitive with solution found on the full data set. We provide an overview over the state-of-the-art in coreset construction for machine learning. In Section 2, we present both the intuition behind and a theoretically sound framework to construct coresets for general problems and apply it to $k$-means clustering. In Section 3 we summarize existing coreset construction algorithms for a variety of machine learning problems such as maximum likelihood estimation of mixture models, Bayesian non-parametric models, principal component analysis, regression and general empirical risk minimization.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes