BMLGMLJul 16, 2024

Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design

arXiv:2407.11942v119 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the problem of limited generalization in generative models for drug discovery, materials science, and protein design, representing an incremental improvement over existing methods.

The paper tackled the challenge of generating high-value samples beyond the training data in diffusion models for molecular and protein design, and introduced context-guided diffusion (CGD), a plug-and-play method that improved out-of-distribution generalization, leading to substantial performance gains across various settings.

Generative models have the potential to accelerate key steps in the discovery of novel molecular therapeutics and materials. Diffusion models have recently emerged as a powerful approach, excelling at unconditional sample generation and, with data-driven guidance, conditional generation within their training domain. Reliably sampling from high-value regions beyond the training data, however, remains an open challenge -- with current methods predominantly focusing on modifying the diffusion process itself. In this paper, we develop context-guided diffusion (CGD), a simple plug-and-play method that leverages unlabeled data and smoothness constraints to improve the out-of-distribution generalization of guided diffusion models. We demonstrate that this approach leads to substantial performance gains across various settings, including continuous, discrete, and graph-structured diffusion processes with applications across drug discovery, materials science, and protein design.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes