Guoqing Zhou

LG
h-index36
8papers
55citations
Novelty54%
AI Score52

8 Papers

84.1QMMar 16Code
Fold-CP: A Context Parallelism Framework for Biomolecular Modeling

Dejun Lin, Simon Chu, Vishanth Iyer et al.

Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding models across multiple GPUs. We use the Boltz models as open source reference architectures and implement custom multidimensional primitives that efficiently parallelize both the dense triangular updates and the irregular, data-dependent pattern of window-batched local attention. Our approach achieves efficient memory scaling; for an N-token input distributed across P GPUs, per-device memory scales as $O(N^2/P)$, enabling the structure prediction of assemblies exceeding 30,000 residues on 64 NVIDIA B300 GPUs. We demonstrate the scientific utility of this approach through successful developer use cases: Fold-CP enabled the scoring of over 90% of Comprehensive Resource of Mammalian protein complexes (CORUM) database, as well as folding of disease-relevant PI4KA lipid kinase complex bound to an intrinsically disordered region without cropping. By providing a scalable pathway for modeling massive systems with full global context, Fold-CP represents a significant step toward the realization of a virtual cell.

100.0LGMar 30
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

Kieran Didi, Zuobai Zhang, Guoqing Zhou et al.

Protein interaction modeling is central to protein design, which has been transformed by machine learning with applications in drug discovery and beyond. In this landscape, structure-based de novo binder design is cast as either conditional generative modeling or sequence optimization via structure predictors ("hallucination"). We argue that this is a false dichotomy and propose Proteina-Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architectures and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this generative prior, unifying the strengths of previously distinct generative and hallucination methods. Proteina-Complexa sets a new state of the art in computational binder design benchmarks: it delivers markedly higher in-silico success rates than existing generative approaches, and our novel test-time optimization strategies greatly outperform previous hallucination methods under normalized compute budgets. We also demonstrate interface hydrogen bond optimization, fold class-guided binder generation, and extensions to small molecule targets and enzyme design tasks, again surpassing prior methods. Code, models and new data will be publicly released.

IVApr 1, 2022
Epipolar Focus Spectrum: A Novel Light Field Representation and Application in Dense-view Reconstruction

Yaning Li, Xue Wang, Hao Zhu et al.

Existing light field representations, such as epipolar plane image (EPI) and sub-aperture images, do not consider the structural characteristics across the views, so they usually require additional disparity and spatial structure cues for follow-up tasks. Besides, they have difficulties dealing with occlusions or larger disparity scenes. To this end, this paper proposes a novel Epipolar Focus Spectrum (EFS) representation by rearranging the EPI spectrum. Different from the classical EPI representation where an EPI line corresponds to a specific depth, there is a one-to-one mapping from the EFS line to the view. Accordingly, compared to a sparsely-sampled light field, a densely-sampled one with the same field of view (FoV) leads to a more compact distribution of such linear structures in the double-cone-shaped region with the identical opening angle in its corresponding EFS. Hence the EFS representation is invariant to the scene depth. To demonstrate its effectiveness, we develop a trainable EFS-based pipeline for light field reconstruction, where a dense light field can be reconstructed by compensating the "missing EFS lines" given a sparse light field, yielding promising results with cross-view consistency, especially in the presence of severe occlusion and large disparity. Experimental results on both synthetic and real-world datasets demonstrate the validity and superiority of the proposed method over SOTA methods.

LGNov 15, 2024Code
BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

Peter St. John, Dejun Lin, Polina Binder et al.

Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.

CVDec 19, 2024Code
Bright-NeRF:Brightening Neural Radiance Field with Color Restoration from Low-light Raw Images

Min Wang, Xin Huang, Guoqing Zhou et al.

Neural Radiance Fields (NeRFs) have demonstrated prominent performance in novel view synthesis. However, their input heavily relies on image acquisition under normal light conditions, making it challenging to learn accurate scene representation in low-light environments where images typically exhibit significant noise and severe color distortion. To address these challenges, we propose a novel approach, Bright-NeRF, which learns enhanced and high-quality radiance fields from multi-view low-light raw images in an unsupervised manner. Our method simultaneously achieves color restoration, denoising, and enhanced novel view synthesis. Specifically, we leverage a physically-inspired model of the sensor's response to illumination and introduce a chromatic adaptation loss to constrain the learning of response, enabling consistent color perception of objects regardless of lighting conditions. We further utilize the raw data's properties to expose the scene's intensity automatically. Additionally, we have collected a multi-view low-light raw image dataset to advance research in this field. Experimental results demonstrate that our proposed method significantly outperforms existing 2D and 3D approaches. Our code and dataset will be made publicly available.

LGJul 13, 2025
Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow

Zhonglin Cao, Mario Geiger, Allan dos Santos Costa et al.

Fast and accurate generation of molecular conformers is desired for downstream computational chemistry and drug discovery tasks. Currently, training and sampling state-of-the-art diffusion or flow-based models for conformer generation require significant computational resources. In this work, we build upon flow-matching and propose two mechanisms for accelerating training and inference of generative models for 3D molecular conformer generation. For fast training, we introduce the SO(3)-Averaged Flow training objective, which leads to faster convergence to better generation quality compared to conditional optimal transport flow or Kabsch-aligned flow. We demonstrate that models trained using SO(3)-Averaged Flow can reach state-of-the-art conformer generation quality. For fast inference, we show that the reflow and distillation methods of flow-based models enable few-steps or even one-step molecular conformer generation with high quality. The training techniques proposed in this work show a path towards highly efficient molecular conformer generation with flow-based models.

IVJan 23, 2021
Deep Anti-aliasing of Whole Focal Stack Using Slice Spectrum

Yaning Li, Xue Wang, Hao Zhu et al.

The paper aims at removing the aliasing effects of the whole focal stack generated from a sparse-sampled {4D} light field, while keeping the consistency across all the focal layers. We first explore the structural characteristics embedded in the focal stack slice and its corresponding frequency-domain representation, i.e., the Focal Stack Spectrum (FSS). We observe that the energy distribution of the FSS always resides within the same triangular area under different angular sampling rates, additionally the continuity of the Point Spread Function (PSF) is intrinsically maintained in the FSS. Based on these two observations, we propose a learning-based FSS reconstruction approach for one-time aliasing removing over the whole focal stack. Moreover, a novel conjugate-symmetric loss function is proposed for the optimization. Compared to previous works, our method avoids an explicit depth estimation, and can handle challenging large-disparity scenarios. Experimental results on both synthetic and real light field datasets show the superiority of the proposed approach for different scenes and various angular sampling rates.

CVJun 14, 2018
Dense Light Field Reconstruction From Sparse Sampling Using Residual Network

Mantang Guo, Hao Zhu, Guoqing Zhou et al.

A light field records numerous light rays from a real-world scene. However, capturing a dense light field by existing devices is a time-consuming process. Besides, reconstructing a large amount of light rays equivalent to multiple light fields using sparse sampling arises a severe challenge for existing methods. In this paper, we present a learning based method to reconstruct multiple novel light fields between two mutually independent light fields. We indicate that light rays distributed in different light fields have the same consistent constraints under a certain condition. The most significant constraint is a depth related correlation between angular and spatial dimensions. Our method avoids working out the error-sensitive constraint by employing a deep neural network. We solve residual values of pixels on epipolar plane image (EPI) to reconstruct novel light fields. Our method is able to reconstruct 2 to 4 novel light fields between two mutually independent input light fields. We also compare our results with those yielded by a number of alternatives elsewhere in the literature, which shows our reconstructed light fields have better structure similarity and occlusion relationship.