BMLGJun 20, 2025

Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings

arXiv:2506.17064v42 citationsh-index: 3
Originality Incremental advance
AI Analysis

This provides a practical tool for system-specific, all-atom ensemble generation for large proteins, aiding structure-based therapeutic design on complex, dynamic targets, though it is incremental as it builds on existing diffusion and graph neural network methods.

The authors tackled the problem of generating diverse, all-atom conformational ensembles for dynamic proteins like GPCRs, achieving high structural fidelity with an all-atom lDDT of approximately 0.7 and a Jensen-Shannon divergence of less than 0.03 compared to reference MD data.

Generating diverse, all-atom conformational ensembles of dynamic proteins such as G-protein-coupled receptors (GPCRs) is critical for understanding their function, yet most generative models simplify atomic detail or ignore conformational diversity altogether. We present latent diffusion for full protein generation (LD-FPG), a framework that constructs complete all-atom protein structures, including every side-chain heavy atom, directly from molecular dynamics (MD) trajectories. LD-FPG employs a Chebyshev graph neural network (ChebNet) to obtain low-dimensional latent embeddings of protein conformations, which are processed using three pooling strategies: blind, sequential and residue-based. A diffusion model trained on these latent representations generates new samples that a decoder, optionally regularized by dihedral-angle losses, maps back to Cartesian coordinates. Using D2R-MD, a 2-microsecond MD trajectory (12 000 frames) of the human dopamine D2 receptor in a membrane environment, the sequential and residue-based pooling strategy reproduces the reference ensemble with high structural fidelity (all-atom lDDT of approximately 0.7; C-alpha-lDDT of approximately 0.8) and recovers backbone and side-chain dihedral-angle distributions with a Jensen-Shannon divergence of less than 0.03 compared to the MD data. LD-FPG thereby offers a practical route to system-specific, all-atom ensemble generation for large proteins, providing a promising tool for structure-based therapeutic design on complex, dynamic targets. The D2R-MD dataset and our implementation are freely available to facilitate further research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes