MLLGApr 7, 2021

Online Feature Screening for Data Streams with Concept Drift

arXiv:2104.02883v18 citations
Originality Incremental advance
AI Analysis

This work addresses feature selection for classification datasets with streaming input, sparsity, and concept drift, representing an incremental improvement over existing online methods.

The paper tackles the problem of feature selection for data streams with concept drift, proposing online screening methods that match offline feature importance with faster speed and less storage, and showing that integrated model adaptation improves true feature detection rates and offers advantages in computing time, space, model complexity, or accuracy on real datasets.

Screening feature selection methods are often used as a preprocessing step for reducing the number of variables before training step. Traditional screening methods only focus on dealing with complete high dimensional datasets. Modern datasets not only have higher dimension and larger sample size, but also have properties such as streaming input, sparsity and concept drift. Therefore a considerable number of online feature selection methods were introduced to handle these kind of problems in recent years. Online screening methods are one of the categories of online feature selection methods. The methods that we proposed in this research are capable of handling all three situations mentioned above. Our research study focuses on classification datasets. Our experiments show proposed methods can generate the same feature importance as their offline version with faster speed and less storage consumption. Furthermore, the results show that online screening methods with integrated model adaptation have a higher true feature detection rate than without model adaptation on data streams with the concept drift property. Among the two large real datasets that potentially have the concept drift property, online screening methods with model adaptation show advantages in either saving computing time and space, reducing model complexity, or improving prediction accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes