LG MLJun 27, 2012

Discovering Support and Affiliated Features from Very High Dimensions

Yiteng Zhai, Mingkui Tan, Ivor Tsang, Yew Soon Ong

arXiv:1206.6477v134 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of feature selection and interpretation in high-dimensional datasets, which is crucial for fields like bioinformatics and data mining, though it appears incremental as it builds on existing embedded feature selection methods.

The paper tackles the problem of identifying informative and correlated feature groups from very high-dimensional data, resulting in significant prediction performance improvements over state-of-the-art feature selection methods and enabling the discovery of underlying group structures for better interpretation.

In this paper, a novel learning paradigm is presented to automatically identify groups of informative and correlated features from very high dimensions. Specifically, we explicitly incorporate correlation measures as constraints and then propose an efficient embedded feature selection method using recently developed cutting plane strategy. The benefits of the proposed algorithm are two-folds. First, it can identify the optimal discriminative and uncorrelated feature subset to the output labels, denoted here as Support Features, which brings about significant improvements in prediction performance over other state of the art feature selection methods considered in the paper. Second, during the learning process, the underlying group structures of correlated features associated with each support feature, denoted as Affiliated Features, can also be discovered without any additional cost. These affiliated features serve to improve the interpretations on the learning tasks. Extensive empirical studies on both synthetic and very high dimensional real-world datasets verify the validity and efficiency of the proposed method.

View on arXiv PDF

Similar