MLLGOct 16, 2020

Feature Selection for Huge Data via Minipatch Learning

arXiv:2010.08529v211 citations
Originality Incremental advance
AI Analysis

This addresses feature selection challenges for huge datasets with millions of observations and features, offering improved interpretability and performance, though it is incremental as it builds on existing methods.

The paper tackles the computational intractability and statistical accuracy degradation of feature selection in huge-data settings by proposing Stable Minipatch Selection (STAMPS) and Adaptive STAMPS (AdaSTAMPS), which build ensembles on tiny random subsets, and empirically shows that AdaSTAMPS dominates competing methods in accuracy and speed.

Feature selection often leads to increased model interpretability, faster computation, and improved model performance by discarding irrelevant or redundant features. While feature selection is a well-studied problem with many widely-used techniques, there are typically two key challenges: i) many existing approaches become computationally intractable in huge-data settings with millions of observations and features; and ii) the statistical accuracy of selected features degrades in high-noise, high-correlation settings, thus hindering reliable model interpretation. We tackle these problems by proposing Stable Minipatch Selection (STAMPS) and Adaptive STAMPS (AdaSTAMPS). These are meta-algorithms that build ensembles of selection events of base feature selectors trained on many tiny, (adaptively-chosen) random subsets of both the observations and features of the data, which we call minipatches. Our approaches are general and can be employed with a variety of existing feature selection strategies and machine learning techniques. In addition, we provide theoretical insights on STAMPS and empirically demonstrate that our approaches, especially AdaSTAMPS, dominate competing methods in terms of feature selection accuracy and computational time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes