LGCGFeb 7, 2024

No Dimensional Sampling Coresets for Classification

arXiv:2402.05280v24 citationsh-index: 10ICML
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient data reduction for machine learning practitioners by offering a general, dimension-independent coreset framework, though it is incremental in refining existing sensitivity sampling methods.

The paper tackles the problem of constructing coresets for classification by introducing the first no-dimensional coresets, where size is independent of data dimension, and provides approximation guarantees for various loss functions using distributional input and i.i.d. samples.

We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the dimension. Moreover, our results are general, apply for distributional input and can use iid samples, so provide sample complexity bounds, and work for a variety of loss functions. A key tool we develop is a Radamacher complexity version of the main sensitivity sampling approach, which can be of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes