LGBMJul 3, 2024

NEBULA: Neural Empirical Bayes Under Latent Representations for Efficient and Controllable Design of Molecular Libraries

arXiv:2407.03428v14 citationsh-index: 10Has Code
AI Analysis

This work addresses the problem of slow molecular library generation for drug discovery, offering a significant speed improvement, though it is incremental as it builds on prior latent space and neural empirical Bayes methods.

The paper tackles the challenge of efficiently generating large molecular libraries around a seed compound by introducing NEBULA, a latent 3D generative model that achieves nearly an order of magnitude faster generation than existing methods without sacrificing sample quality, as demonstrated on public datasets and recent drugs.

We present NEBULA, the first latent 3D generative model for scalable generation of large molecular libraries around a seed compound of interest. Such libraries are crucial for scientific discovery, but it remains challenging to generate large numbers of high quality samples efficiently. 3D-voxel-based methods have recently shown great promise for generating high quality samples de novo from random noise (Pinheiro et al., 2023). However, sampling in 3D-voxel space is computationally expensive and use in library generation is prohibitively slow. Here, we instead perform neural empirical Bayes sampling (Saremi & Hyvarinen, 2019) in the learned latent space of a vector-quantized variational autoencoder. NEBULA generates large molecular libraries nearly an order of magnitude faster than existing methods without sacrificing sample quality. Moreover, NEBULA generalizes better to unseen drug-like molecules, as demonstrated on two public datasets and multiple recently released drugs. We expect the approach herein to be highly enabling for machine learning-based drug discovery. The code is available at https://github.com/prescient-design/nebula

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes