LGMar 16, 2023

Multi-modal Differentiable Unsupervised Feature Selection

arXiv:2303.09381v16 citationsh-index: 70
Originality Incremental advance
AI Analysis

This work addresses the computational challenge of feature selection in multi-modal biological data, which is incremental as it builds on existing Laplacian-based methods.

The paper tackles the problem of identifying informative variables in multi-modal high-dimensional biological data by proposing an unsupervised feature selection framework that distinguishes shared and differential latent structures. The method demonstrated improved accuracy in capturing these structures on synthetic and real datasets, including single-cell multi-omics.

Multi-modal high throughput biological data presents a great scientific opportunity and a significant computational challenge. In multi-modal measurements, every sample is observed simultaneously by two or more sets of sensors. In such settings, many observed variables in both modalities are often nuisance and do not carry information about the phenomenon of interest. Here, we propose a multi-modal unsupervised feature selection framework: identifying informative variables based on coupled high-dimensional measurements. Our method is designed to identify features associated with two types of latent low-dimensional structures: (i) shared structures that govern the observations in both modalities and (ii) differential structures that appear in only one modality. To that end, we propose two Laplacian-based scoring operators. We incorporate the scores with differentiable gates that mask nuisance features and enhance the accuracy of the structure captured by the graph Laplacian. The performance of the new scheme is illustrated using synthetic and real datasets, including an extended biological application to single-cell multi-omics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes