LG CV MLOct 2, 2019

A Geometric Approach to Online Streaming Feature Selection

Salimeh Yasaei Sekeh, Madan Ravi Ganesh, Shurjo Banerjee, Jason J. Corso, Alfred O. Hero

arXiv:1910.01182v22.72 citations

Originality Incremental advance

AI Analysis

This work improves feature selection for streaming data applications, but it is incremental as it builds on existing OSFS methods.

The authors tackled the unrealistic assumption in Online Streaming Feature Selection (OSFS) by introducing a new setting with concurrent streaming of features and samples (OSFS-SS), and proposed Geometric Online Adaption (GOA), which outperformed baselines like SAOLA by using fewer comparison steps and a bounded dependency measure. They also addressed flawed comparison metrics in OSFS by fixing the maximum number of features for fair accuracy evaluation.

Online Streaming Feature Selection (OSFS) is a sequential learning problem where individual features across all samples are made available to algorithms in a streaming fashion. In this work, firstly, we assert that OSFS's main assumption of having data from all the samples available at runtime is unrealistic and introduce a new setting where features and samples are streamed concurrently called OSFS with Streaming Samples (OSFS-SS). Secondly, the primary OSFS method, SAOLA utilizes an unbounded mutual information measure and requires multiple comparison steps between the stored and incoming feature sets to evaluate a feature's importance. We introduce Geometric Online Adaption, an algorithm that requires relatively less feature comparison steps and uses a bounded conditional geometric dependency measure. Our algorithm outperforms several OSFS baselines including SAOLA on a variety of datasets. We also extend SAOLA to work in the OSFS-SS setting and show that GOA continues to achieve the best results. Thirdly, the current paradigm of the OSFS algorithm comparison is flawed. Algorithms are measured by comparing the number of features used and the accuracy obtained by the learner, two properties that are fundamentally at odds with one another. Without fixing a limit on either of these properties, the qualities of features obtained by different algorithms are incomparable. We try to rectify this inconsistency by fixing the maximum number of features available to the learner and comparing algorithms in terms of their accuracy. Additionally, we characterize the behaviour of SAOLA and GOA on feature sets derived from popular deep convolutional featurizers.

View on arXiv PDF

Similar