CVOct 13, 2022
Amortized Inference for Heterogeneous Reconstruction in Cryo-EMAxel Levy, Gordon Wetzstein, Julien Martel et al.
Cryo-electron microscopy (cryo-EM) is an imaging modality that provides unique insights into the dynamics of proteins and other building blocks of life. The algorithmic challenge of jointly estimating the poses, 3D structure, and conformational heterogeneity of a biomolecule from millions of noisy and randomly oriented 2D projections in a computationally efficient manner, however, remains unsolved. Our method, cryoFIRE, performs ab initio heterogeneous reconstruction with unknown poses in an amortized framework, thereby avoiding the computationally expensive step of pose search while enabling the analysis of conformational heterogeneity. Poses and conformation are jointly estimated by an encoder while a physics-based decoder aggregates the images into an implicit neural representation of the conformational space. We show that our method can provide one order of magnitude speedup on datasets containing millions of images without any loss of accuracy. We validate that the joint estimation of poses and conformations can be amortized over the size of the dataset. For the first time, we prove that an amortized method can extract interpretable dynamic information from experimental datasets.
LGDec 2, 2025Code
Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR SpectraZiyu Xiong, Yichi Zhang, Foyez Alauddin et al.
Nuclear Magnetic Resonance (NMR) spectroscopy is a cornerstone technique for determining the structures of small molecules and is especially critical in the discovery of novel natural products and clinical therapeutics. Yet, interpreting NMR spectra remains a time-consuming, manual process requiring extensive domain expertise. We introduce ChefNMR (CHemical Elucidation From NMR), an end-to-end framework that directly predicts an unknown molecule's structure solely from its 1D NMR spectra and chemical formula. We frame structure elucidation as conditional generation from an atomic diffusion model built on a non-equivariant transformer architecture. To model the complex chemical groups found in natural products, we generated a dataset of simulated 1D NMR spectra for over 111,000 natural products. ChefNMR predicts the structures of challenging natural product compounds with an unsurpassed accuracy of over 65%. This work takes a significant step toward solving the grand challenge of automating small-molecule structure elucidation and highlights the potential of deep learning in accelerating molecular discovery. Code is available at https://github.com/ml-struct-bio/chefnmr.
CVAug 10, 2024
CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EMMinkyu Jeon, Rishwanth Raghu, Miro Astore et al.
Cryo-electron microscopy (cryo-EM) is a powerful technique for determining high-resolution 3D biomolecular structures from imaging data. Its unique ability to capture structural variability has spurred the development of heterogeneous reconstruction algorithms that can infer distributions of 3D structures from noisy, unlabeled imaging data. Despite the growing number of advanced methods, progress in the field is hindered by the lack of standardized benchmarks with ground truth information and reliable validation metrics. Here, we introduce CryoBench, a suite of datasets, metrics, and benchmarks for heterogeneous reconstruction in cryo-EM. CryoBench includes five datasets representing different sources of heterogeneity and degrees of difficulty. These include conformational heterogeneity generated from designed motions of antibody complexes or sampled from a molecular dynamics simulation, as well as compositional heterogeneity from mixtures of ribosome assembly states or 100 common complexes present in cells. We then analyze state-of-the-art heterogeneous reconstruction tools, including neural and non-neural methods, assess their sensitivity to noise, and propose new metrics for quantitative evaluation. We hope that CryoBench will be a foundational resource for accelerating algorithmic development and evaluation in the cryo-EM and machine learning communities. Project page: https://cryobench.cs.princeton.edu.
49.6CVMar 19
CryoHype: Reconstructing a thousand cryo-EM structures with transformer-based hypernetworksJeffrey Gu, Minkyu Jeon, Ambri Ma et al.
Cryo-electron microscopy (cryo-EM) is an indispensable technique for determining the 3D structures of dynamic biomolecular complexes. While typically applied to image a single molecular species, cryo-EM has the potential for structure determination of many targets simultaneously in a high-throughput fashion. However, existing methods typically focus on modeling conformational heterogeneity within a single or a few structures and are not designed to resolve compositional heterogeneity arising from mixtures of many distinct molecular species. To address this challenge, we propose CryoHype, a transformer-based hypernetwork for cryo-EM reconstruction that dynamically adjusts the weights of an implicit neural representation. Using CryoHype, we achieve state-of-the-art results on a challenging benchmark dataset containing 100 structures. We further demonstrate that CryoHype scales to the reconstruction of 1,000 distinct structures from unlabeled cryo-EM images in the fixed-pose setting.
LGJun 4, 2025
Multiscale guidance of AlphaFold3 with heterogeneous cryo-EM dataRishwanth Raghu, Axel Levy, Gordon Wetzstein et al.
Protein structure prediction models are now capable of generating accurate 3D structural hypotheses from sequence alone. However, they routinely fail to capture the conformational diversity of dynamic biomolecular complexes, often requiring heuristic MSA subsampling approaches for generating alternative states. In parallel, cryo-electron microscopy (cryo-EM) has emerged as a powerful tool for imaging near-native structural heterogeneity, but is challenged by arduous pipelines to go from raw experimental data to atomic models. Here, we bridge the gap between these modalities, combining cryo-EM density maps with the rich sequence and biophysical priors learned by protein structure prediction models. Our method, CryoBoltz, guides the sampling trajectory of a pretrained protein structure prediction model using both global and local structural constraints derived from density maps, driving predictions towards conformational states consistent with the experimental data. We demonstrate that this flexible yet powerful inference-time approach allows us to build atomic models into heterogeneous cryo-EM maps across a variety of dynamic biomolecular systems including transporters and antibodies.
LGDec 12, 2024
Mixture of neural fields for heterogeneous reconstruction in cryo-EMAxel Levy, Rishwanth Raghu, David Shustin et al.
Cryo-electron microscopy (cryo-EM) is an experimental technique for protein structure determination that images an ensemble of macromolecules in near-physiological contexts. While recent advances enable the reconstruction of dynamic conformations of a single biomolecular complex, current methods do not adequately model samples with mixed conformational and compositional heterogeneity. In particular, datasets containing mixtures of multiple proteins require the joint inference of structure, pose, compositional class, and conformational states for 3D reconstruction. Here, we present Hydra, an approach that models both conformational and compositional heterogeneity fully ab initio by parameterizing structures as arising from one of K neural fields. We employ a new likelihood-based loss function and demonstrate the effectiveness of our approach on synthetic datasets composed of mixtures of proteins with large degrees of conformational variability. We additionally demonstrate Hydra on an experimental dataset of a cellular lysate containing a mixture of different protein complexes. Hydra expands the expressivity of heterogeneous reconstruction methods and thus broadens the scope of cryo-EM to increasingly complex samples.
LGJun 6, 2024
Solving Inverse Problems in Protein Space Using Diffusion-Based PriorsAxel Levy, Eric R. Chan, Sara Fridovich-Keil et al.
The interaction of a protein with its environment can be understood and controlled via its 3D structure. Experimental methods for protein structure determination, such as X-ray crystallography or cryogenic electron microscopy, shed light on biological processes but introduce challenging inverse problems. Learning-based approaches have emerged as accurate and efficient methods to solve these inverse problems for 3D structure determination, but are specialized for a predefined type of measurement. Here, we introduce a versatile framework to turn biophysical measurements, such as cryo-EM density maps, into 3D atomic models. Our method combines a physics-based forward model of the measurement process with a pretrained generative model providing a task-agnostic, data-driven prior. Our method outperforms posterior sampling baselines on linear and non-linear inverse problems. In particular, it is the first diffusion-based method for refining atomic models from cryo-EM maps and building atomic models from sparse distance matrices.
CVSep 25, 2019
Explicitly disentangling image content from translation and rotation with spatial-VAETristan Bepler, Ellen D. Zhong, Kotaro Kelley et al.
Given an image dataset, we are often interested in finding data generative factors that encode semantic content independently from pose variables such as rotation and translation. However, current disentanglement approaches do not impose any specific structure on the learned latent representations. We propose a method for explicitly disentangling image rotation and translation from other unstructured latent factors in a variational autoencoder (VAE) framework. By formulating the generative model as a function of the spatial coordinate, we make the reconstruction error differentiable with respect to latent translation and rotation parameters. This formulation allows us to train a neural network to perform approximate inference on these latent variables while explicitly constraining them to only represent rotation and translation. We demonstrate that this framework, termed spatial-VAE, effectively learns latent representations that disentangle image rotation and translation from content and improves reconstruction over standard VAEs on several benchmark datasets, including applications to modeling continuous 2-D views of proteins from single particle electron microscopy and galaxies in astronomical images.
QMSep 11, 2019
Reconstructing continuous distributions of 3D protein structure from cryo-EM imagesEllen D. Zhong, Tristan Bepler, Joseph H. Davis et al.
Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structure of proteins and other macromolecular complexes at near-atomic resolution. In single particle cryo-EM, the central problem is to reconstruct the three-dimensional structure of a macromolecule from $10^{4-7}$ noisy and randomly oriented two-dimensional projections. However, the imaged protein complexes may exhibit structural variability, which complicates reconstruction and is typically addressed using discrete clustering approaches that fail to capture the full range of protein dynamics. Here, we introduce a novel method for cryo-EM reconstruction that extends naturally to modeling continuous generative factors of structural heterogeneity. This method encodes structures in Fourier space using coordinate-based deep neural networks, and trains these networks from unlabeled 2D cryo-EM images by combining exact inference over image orientation with variational inference for structural heterogeneity. We demonstrate that the proposed method, termed cryoDRGN, can perform ab initio reconstruction of 3D protein complexes from simulated and real 2D cryo-EM image data. To our knowledge, cryoDRGN is the first neural network-based approach for cryo-EM reconstruction and the first end-to-end method for directly reconstructing continuous ensembles of protein structures from cryo-EM images.