ML LG GNMar 6, 2020

BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders

arXiv:2003.03462v13.82 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for more interpretable and efficient analysis in domains such as genomics, though it is an incremental improvement over existing VAE methods.

The authors tackled the problem of needing separate steps for dimensionality reduction and feature clustering in high-dimensional tabular data like genomics by proposing BasisVAE, a joint modeling framework that integrates these tasks, achieving scalable inference as demonstrated on single-cell gene expression data.

Variational Autoencoders (VAEs) provide a flexible and scalable framework for non-linear dimensionality reduction. However, in application domains such as genomics where data sets are typically tabular and high-dimensional, a black-box approach to dimensionality reduction does not provide sufficient insights. Common data analysis workflows additionally use clustering techniques to identify groups of similar features. This usually leads to a two-stage process, however, it would be desirable to construct a joint modelling framework for simultaneous dimensionality reduction and clustering of features. In this paper, we propose to achieve this through the BasisVAE: a combination of the VAE and a probabilistic clustering prior, which lets us learn a one-hot basis function representation as part of the decoder network. Furthermore, for scenarios where not all features are aligned, we develop an extension to handle translation-invariant basis functions. We show how a collapsed variational inference scheme leads to scalable and efficient inference for BasisVAE, demonstrated on various toy examples as well as on single-cell gene expression data.

View on arXiv PDF Code

Similar