LG MLJun 7, 2013

Loss-Proportional Subsampling for Subsequent ERM

arXiv:1306.1840v28 citations

AI Analysis

This addresses data efficiency for machine learning practitioners, but appears incremental as it builds on existing subsampling and ERM methods.

The paper tackles the problem of reducing dataset size before empirical risk minimization by proposing a loss-proportional subsampling scheme that guarantees favorable excess risk compared to using the full dataset, and demonstrates practical benefits on a large dataset with boosted trees, achieving unspecified concrete results.

We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees.

View on arXiv PDF

Similar