LGAIBMNov 4, 2024

Bridge-IF: Learning Inverse Protein Folding with Markov Bridges

arXiv:2411.02120v112 citationsh-index: 10Has CodeNIPS
Originality Highly original
AI Analysis

This work addresses a fundamental challenge in computational protein design for researchers and practitioners, offering a novel generative approach to overcome limitations in discriminative methods.

The paper tackles the inverse protein folding problem by proposing Bridge-IF, a generative diffusion bridge model that learns probabilistic dependencies between backbone structures and protein sequences, resulting in improved sequence recovery and design of foldable proteins compared to existing baselines.

Inverse protein folding is a fundamental task in computational protein design, which aims to design protein sequences that fold into the desired backbone structures. While the development of machine learning algorithms for this task has seen significant success, the prevailing approaches, which predominantly employ a discriminative formulation, frequently encounter the error accumulation issue and often fail to capture the extensive variety of plausible sequences. To fill these gaps, we propose Bridge-IF, a generative diffusion bridge model for inverse folding, which is designed to learn the probabilistic dependency between the distributions of backbone structures and protein sequences. Specifically, we harness an expressive structure encoder to propose a discrete, informative prior derived from structures, and establish a Markov bridge to connect this prior with native sequences. During the inference stage, Bridge-IF progressively refines the prior sequence, culminating in a more plausible design. Moreover, we introduce a reparameterization perspective on Markov bridge models, from which we derive a simplified loss function that facilitates more effective training. We also modulate protein language models (PLMs) with structural conditions to precisely approximate the Markov bridge process, thereby significantly enhancing generation performance while maintaining parameter-efficient training. Extensive experiments on well-established benchmarks demonstrate that Bridge-IF predominantly surpasses existing baselines in sequence recovery and excels in the design of plausible proteins with high foldability. The code is available at https://github.com/violet-sto/Bridge-IF.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes