LGQMDec 6, 2022

Improving Molecule Properties Through 2-Stage VAE

arXiv:2212.02750v11 citationsh-index: 65
Originality Incremental advance
AI Analysis

This work addresses a deficiency in VAEs for drug discovery, offering an incremental improvement in generating molecules with better property similarity.

The paper tackles the problem of poor manifold recovery in variational autoencoders (VAEs) for drug discovery by proposing a 2-stage VAE, where the second stage is trained on the latent space of the first. It shows that this method significantly improves property statistics on the ChEMBL and polymer datasets compared to a pre-existing method.

Variational autoencoder (VAE) is a popular method for drug discovery and there had been a great deal of architectures and pipelines proposed to improve its performance. But the VAE model itself suffers from deficiencies such as poor manifold recovery when data lie on low-dimensional manifold embedded in higher dimensional ambient space and they manifest themselves in each applications differently. The consequences of it in drug discovery is somewhat under-explored. In this paper, we study how to improve the similarity of the data generated via VAE and the training dataset by improving manifold recovery via a 2-stage VAE where the second stage VAE is trained on the latent space of the first one. We experimentally evaluated our approach using the ChEMBL dataset as well as a polymer datasets. In both dataset, the 2-stage VAE method is able to improve the property statistics significantly from a pre-existing method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes