QMLGBMSep 30, 2022

State-specific protein-ligand complex structure prediction with a multi-scale deep generative model

arXiv:2209.15171v2164 citationsh-index: 78
Originality Highly original
AI Analysis

This addresses a critical bottleneck in computational biology for drug and enzyme design by enabling accurate prediction of protein-ligand binding structures.

The paper tackles the problem of predicting protein-ligand complex structures, which existing methods cannot systematically do, and presents NeuralPLexer, a deep generative model that achieves state-of-the-art performance in benchmarks, outperforming AlphaFold2 with average TM-scores of 0.93 and 0.89 on specific datasets.

The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life. Despite recent advancements in protein structure prediction, existing algorithms are so far unable to systematically predict the binding ligand structures along with their regulatory effects on protein folding. To address this discrepancy, we present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures solely using protein sequence and ligand molecular graph inputs. NeuralPLexer adopts a deep generative model to sample the 3D structures of the binding complex and their conformational changes at an atomistic resolution. The model is based on a diffusion process that incorporates essential biophysical constraints and a multi-scale geometric deep learning system to iteratively sample residue-level contact maps and all heavy-atom coordinates in a hierarchical manner. NeuralPLexer achieves state-of-the-art performance compared to all existing methods on benchmarks for both protein-ligand blind docking and flexible binding site structure recovery. Moreover, owing to its specificity in sampling both ligand-free-state and ligand-bound-state ensembles, NeuralPLexer consistently outperforms AlphaFold2 in terms of global protein structure accuracy on both representative structure pairs with large conformational changes (average TM-score=0.93) and recently determined ligand-binding proteins (average TM-score=0.89). Case studies reveal that the predicted conformational variations are consistent with structure determination experiments for important targets, including human KRAS$^\textrm{G12C}$, ketol-acid reductoisomerase, and purine GPCRs. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes