LGMLJun 27, 2012

Discovering Support and Affiliated Features from Very High Dimensions

arXiv:1206.6477v134 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of feature selection and interpretation in high-dimensional datasets, which is crucial for fields like bioinformatics and data mining, though it appears incremental as it builds on existing embedded feature selection methods.

The paper tackles the problem of identifying informative and correlated feature groups from very high-dimensional data, resulting in significant prediction performance improvements over state-of-the-art feature selection methods and enabling the discovery of underlying group structures for better interpretation.

In this paper, a novel learning paradigm is presented to automatically identify groups of informative and correlated features from very high dimensions. Specifically, we explicitly incorporate correlation measures as constraints and then propose an efficient embedded feature selection method using recently developed cutting plane strategy. The benefits of the proposed algorithm are two-folds. First, it can identify the optimal discriminative and uncorrelated feature subset to the output labels, denoted here as Support Features, which brings about significant improvements in prediction performance over other state of the art feature selection methods considered in the paper. Second, during the learning process, the underlying group structures of correlated features associated with each support feature, denoted as Affiliated Features, can also be discovered without any additional cost. These affiliated features serve to improve the interpretations on the learning tasks. Extensive empirical studies on both synthetic and very high dimensional real-world datasets verify the validity and efficiency of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes