LGMLFeb 27, 2020

Correlated Feature Selection with Extended Exclusive Group Lasso

arXiv:2002.12460v110 citations
AI Analysis

This addresses feature selection for biological applications where correlated features like genes are common, but it is incremental as it builds on existing Lasso derivatives.

The paper tackles the problem of feature selection in high-dimensional biological data where correlated features cause Lasso to perform poorly, and proposes an extended exclusive group Lasso method that improves comprehensive selection of informative features over Lasso in experiments with synthetic and real-world data.

In many high dimensional classification or regression problems set in a biological context, the complete identification of the set of informative features is often as important as predictive accuracy, since this can provide mechanistic insight and conceptual understanding. Lasso and related algorithms have been widely used since their sparse solutions naturally identify a set of informative features. However, Lasso performs erratically when features are correlated. This limits the use of such algorithms in biological problems, where features such as genes often work together in pathways, leading to sets of highly correlated features. In this paper, we examine the performance of a Lasso derivative, the exclusive group Lasso, in this setting. We propose fast algorithms to solve the exclusive group Lasso, and introduce a solution to the case when the underlying group structure is unknown. The solution combines stability selection with random group allocation and introduction of artificial features. Experiments with both synthetic and real-world data highlight the advantages of this proposed methodology over Lasso in comprehensive selection of informative features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes