LGOct 27, 2023

Feature Selection in the Contrastive Analysis Setting

arXiv:2310.18531v19 citationsh-index: 15Has Code
Originality Incremental advance
AI Analysis

This addresses a gap in machine learning for biomedical data analysis, enabling more precise identification of disease-specific features, though it appears incremental as it builds on existing contrastive analysis concepts.

The paper tackles the problem of feature selection in contrastive analysis, where variations unique to a target dataset are identified compared to a background dataset, and presents contrastive feature selection (CFS), which consistently outperforms existing state-of-the-art supervised and unsupervised methods on biomedical datasets.

Contrastive analysis (CA) refers to the exploration of variations uniquely enriched in a target dataset as compared to a corresponding background dataset generated from sources of variation that are irrelevant to a given task. For example, a biomedical data analyst may wish to find a small set of genes to use as a proxy for variations in genomic data only present among patients with a given disease (target) as opposed to healthy control subjects (background). However, as of yet the problem of feature selection in the CA setting has received little attention from the machine learning community. In this work we present contrastive feature selection (CFS), a method for performing feature selection in the CA setting. We motivate our approach with a novel information-theoretic analysis of representation learning in the CA setting, and we empirically validate CFS on a semi-synthetic dataset and four real-world biomedical datasets. We find that our method consistently outperforms previously proposed state-of-the-art supervised and fully unsupervised feature selection methods not designed for the CA setting. An open-source implementation of our method is available at https://github.com/suinleelab/CFS.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes