LGAIMLOct 20, 2020

Model-specific Data Subsampling with Influence Functions

arXiv:2010.10218v12 citations
Originality Incremental advance
AI Analysis

This addresses the problem of time-consuming model selection for practitioners dealing with expensive models and large datasets, but it is incremental as it builds on existing influence function methods.

The paper tackles the computational inefficiency of model selection on large datasets by developing a model-specific data subsampling strategy using influence functions, demonstrating empirically that it quickly selects high-quality models.

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the datasets of interest are increasing in size. As a result, the process of model selection is time-consuming and computationally inefficient. In this work, we develop a model-specific data subsampling strategy that improves over random sampling whenever training points have varying influence. Specifically, we leverage influence functions to guide our selection strategy, proving theoretically, and demonstrating empirically that our approach quickly selects high-quality models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes