LG AI MLOct 20, 2020

Model-specific Data Subsampling with Influence Functions

Anant Raj, Cameron Musco, Lester Mackey, Nicolo Fusi

arXiv:2010.10218v13.32 citations

Originality Incremental advance

AI Analysis

This addresses the problem of time-consuming model selection for practitioners dealing with expensive models and large datasets, but it is incremental as it builds on existing influence function methods.

The paper tackles the computational inefficiency of model selection on large datasets by developing a model-specific data subsampling strategy using influence functions, demonstrating empirically that it quickly selects high-quality models.

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the datasets of interest are increasing in size. As a result, the process of model selection is time-consuming and computationally inefficient. In this work, we develop a model-specific data subsampling strategy that improves over random sampling whenever training points have varying influence. Specifically, we leverage influence functions to guide our selection strategy, proving theoretically, and demonstrating empirically that our approach quickly selects high-quality models.

View on arXiv PDF

Similar