BMLGJun 8, 2023

Protein Discovery with Discrete Walk-Jump Sampling

Berkeley
arXiv:2306.12360v254 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses the problem of generating functional antibody proteins for biomedical applications, representing a domain-specific advancement with incremental improvements in training and sampling methods.

The paper tackles the challenge of training and sampling from discrete generative models by introducing Discrete Walk-Jump Sampling, which combines energy-based and score-based models with a single noise level, resulting in 97-100% of generated antibody proteins being successfully expressed and purified, and 70% showing equal or improved binding affinity in experiments.

We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100% of generated samples are successfully expressed and purified and 70% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes