Guojing Cong

LG
h-index13
14papers
375citations
Novelty49%
AI Score44

14 Papers

AIOct 6, 2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang et al. · microsoft-research

In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.

MED-PHMay 25, 2022
AI-aided multiscale modeling of physiologically-significant blood clots

Yicong Zhu, Changnian Han, Peng Zhang et al.

We have developed an AI-aided multiple time stepping (AI-MTS) algorithm and multiscale modeling framework (AI-MSM) and implemented them on the Summit-like supercomputer, AIMOS. AI-MSM is the first of its kind to integrate multi-physics, including intra-platelet, inter-platelet, and fluid-platelet interactions, into one system. It has simulated a record-setting multiscale blood clotting model of 102 million particles, of which 70 flowing and 180 aggregating platelets, under dissipative particle dynamics to coarse-grained molecular dynamics. By adaptively adjusting timestep sizes to match the characteristic time scales of the underlying dynamics, AI-MTS optimally balances speeds and accuracies of the simulations.

MTRL-SCIAug 22, 2022
Prediction of $\textrm{CO}_2$ Adsorption in Nano-Pores with Graph Neural Networks

Guojing Cong, Anshul Gupta, Rodrigo Neumann et al.

We investigate the graph-based convolutional neural network approach for predicting and ranking gas adsorption properties of crystalline Metal-Organic Framework (MOF) adsorbents for application in post-combustion capture of $\textrm{CO}_2$. Our model is based solely on standard structural input files containing atomistic descriptions of the adsorbent material candidates. We construct novel methodological extensions to match the prediction accuracy of classical machine learning models that were built with hundreds of features at much higher computational cost. Our approach can be more broadly applied to optimize gas capture processes at industrial scale.

LGDec 3, 2025
VS-Graph: Scalable and Efficient Graph Classification Using Hyperdimensional Computing

Hamed Poursiami, Shay Snyder, Guojing Cong et al.

Graph classification is a fundamental task in domains ranging from molecular property prediction to materials design. While graph neural networks (GNNs) achieve strong performance by learning expressive representations via message passing, they incur high computational costs, limiting their scalability and deployment on resource-constrained devices. Hyperdimensional Computing (HDC), also known as Vector Symbolic Architectures (VSA), offers a lightweight, brain-inspired alternative, yet existing HDC-based graph methods typically struggle to match the predictive performance of GNNs. In this work, we propose VS-Graph, a vector-symbolic graph learning framework that narrows the gap between the efficiency of HDC and the expressive power of message passing. VS-Graph introduces a Spike Diffusion mechanism for topology-driven node identification and an Associative Message Passing scheme for multi-hop neighborhood aggregation entirely within the high-dimensional vector space. Without gradient-based optimization or backpropagation, our method achieves competitive accuracy with modern GNNs, outperforming the prior HDC baseline by 4-5% on standard benchmarks such as MUTAG and DD. It also matches or exceeds the performance of the GNN baselines on several datasets while accelerating the training by a factor of up to 450x. Furthermore, VS-Graph maintains high accuracy even with the hypervector dimensionality reduced to D=128, demonstrating robustness under aggressive dimension compression and paving the way for ultra-efficient execution on edge and neuromorphic hardware.

DCDec 20, 2023
Optimizing Distributed Training on Frontier for Large Language Models

Sajal Dash, Isaac Lyngaas, Junqi Yin et al.

Large language models (LLMs) have demonstrated remarkable success as foundational models, benefiting various downstream applications through fine-tuning. Recent studies on loss scaling have demonstrated the superior performance of larger LLMs compared to their smaller counterparts. Nevertheless, training LLMs with billions of parameters poses significant challenges and requires considerable computational resources. For example, training a one trillion parameter GPT-style model on 20 trillion tokens requires a staggering 120 million exaflops of computation. This research explores efficient distributed training strategies to extract this computation from Frontier, the world's first exascale supercomputer dedicated to open science. We enable and investigate various model and data parallel training techniques, such as tensor parallelism, pipeline parallelism, and sharded data parallelism, to facilitate training a trillion-parameter model on Frontier. We empirically assess these techniques and their associated parameters to determine their impact on memory footprint, communication latency, and GPU's computational efficiency. We analyze the complex interplay among these techniques and find a strategy to combine them to achieve high throughput through hyperparameter tuning. We have identified efficient strategies for training large LLMs of varying sizes through empirical analysis and hyperparameter tuning. For 22 Billion, 175 Billion, and 1 Trillion parameters, we achieved GPU throughputs of $38.38\%$, $36.14\%$, and $31.96\%$, respectively. For the training of the 175 Billion parameter model and the 1 Trillion parameter model, we achieved $100\%$ weak scaling efficiency on 1024 and 3072 MI250X GPUs, respectively. We also achieved strong scaling efficiencies of $89\%$ and $87\%$ for these two models.

ETApr 25, 2024
Transductive Spiking Graph Neural Networks for Loihi

Shay Snyder, Victoria Clerico, Guojing Cong et al.

Graph neural networks have emerged as a specialized branch of deep learning, designed to address problems where pairwise relations between objects are crucial. Recent advancements utilize graph convolutional neural networks to extract features within graph structures. Despite promising results, these methods face challenges in real-world applications due to sparse features, resulting in inefficient resource utilization. Recent studies draw inspiration from the mammalian brain and employ spiking neural networks to model and learn graph structures. However, these approaches are limited to traditional Von Neumann-based computing systems, which still face hardware inefficiencies. In this study, we present a fully neuromorphic implementation of spiking graph neural networks designed for Loihi 2. We optimize network parameters using Lava Bayesian Optimization, a novel hyperparameter optimization system compatible with neuromorphic computing architectures. We showcase the performance benefits of combining neuromorphic Bayesian optimization with our approach for citation graph classification using fixed-precision spiking neurons. Our results demonstrate the capability of integer-precision, Loihi 2 compatible spiking neural networks in performing citation graph classification with comparable accuracy to existing floating point implementations.

LGOct 28, 2025
HyperGraphX: Graph Transductive Learning with Hyperdimensional Computing and Message Passing

Guojing Cong, Tom Potok, Hamed Poursiami et al.

We present a novel algorithm, \hdgc, that marries graph convolution with binding and bundling operations in hyperdimensional computing for transductive graph learning. For prediction accuracy \hdgc outperforms major and popular graph neural network implementations as well as state-of-the-art hyperdimensional computing implementations for a collection of homophilic graphs and heterophilic graphs. Compared with the most accurate learning methodologies we have tested, on the same target GPU platform, \hdgc is on average 9561.0 and 144.5 times faster than \gcnii, a graph neural network implementation and HDGL, a hyperdimensional computing implementation, respectively. As the majority of the learning operates on binary vectors, we expect outstanding energy performance of \hdgc on neuromorphic and emerging process-in-memory devices.

LGJul 2, 2025
Completion of the DrugMatrix Toxicogenomics Database using 3-Dimensional Tensors

Tan Nguyen, Guojing Cong

We explore applying a tensor completion approach to complete the DrugMatrix toxicogenomics dataset. Our hypothesis is that by preserving the 3-dimensional structure of the data, which comprises tissue, treatment, and transcriptomic measurements, and by leveraging a machine learning formulation, our approach will improve upon prior state-of-the-art results. Our results demonstrate that the new tensor-based method more accurately reflects the original data distribution and effectively captures organ-specific variability. The proposed tensor-based methodology achieved lower mean squared errors and mean absolute errors compared to both conventional Canonical Polyadic decomposition and 2-dimensional matrix factorization methods. In addition, our non-negative tensor completion implementation reveals relationships among tissues. Our findings not only complete the world's largest in-vivo toxicogenomics database with improved accuracy but also offer a promising methodology for future studies of drugs that may cross species barriers, for example, from rats to humans.

LGOct 11, 2024
Predicting Drug Effects from High-Dimensional, Asymmetric Drug Datasets by Using Graph Neural Networks: A Comprehensive Analysis of Multitarget Drug Effect Prediction

Avishek Bose, Guojing Cong

Graph neural networks (GNNs) have emerged as one of the most effective ML techniques for drug effect prediction from drug molecular graphs. Despite having immense potential, GNN models lack performance when using datasets that contain high-dimensional, asymmetrically co-occurrent drug effects as targets with complex correlations between them. Training individual learning models for each drug effect and incorporating every prediction result for a wide spectrum of drug effects are impractical. Therefore, an opportunity exists to address this challenge as multitarget prediction problems and predict all drug effects at a time. We developed standard and hybrid GNNs to perform two separate tasks: multiregression for continuous values and multilabel classification for categorical values contained in our datasets. Because multilabel classification makes the target data even more sparse and introduces asymmetric label co-occurrence, learning these models becomes difficult and heavily impacts the GNN's performance. To address these challenges, we propose a new data oversampling technique to improve multilabel classification performances on all the given imbalanced molecular graph datasets. Using the technique, we improve the data imbalance ratio of the drug effects while protecting the datasets' integrity. Finally, we evaluate the multilabel classification performance of the best-performing hybrid GNN model on all the oversampled datasets obtained from the proposed oversampling technique. In all the evaluation metrics (i.e., precision, recall, and F1 score), this model significantly outperforms other ML models, including GNN models when they are trained on the original datasets or oversampled datasets with MLSMOTE, which is a well-known oversampling technique.

LGOct 1, 2021
Accelerate Distributed Stochastic Descent for Nonconvex Optimization with Momentum

Guojing Cong, Tianyi Liu

Momentum method has been used extensively in optimizers for deep learning. Recent studies show that distributed training through K-step averaging has many nice properties. We propose a momentum method for such model averaging approaches. At each individual learner level traditional stochastic gradient is applied. At the meta-level (global learner level), one momentum term is applied and we call it block momentum. We analyze the convergence and scaling properties of such momentum methods. Our experimental results show that block momentum not only accelerates training, but also achieves better results.

LGNov 27, 2020
CASTELO: Clustered Atom Subtypes aidEd Lead Optimization -- a combined machine learning and molecular modeling method

Leili Zhang, Giacomo Domeniconi, Chih-Chieh Yang et al.

Drug discovery is a multi-stage process that comprises two costly major steps: pre-clinical research and clinical trials. Among its stages, lead optimization easily consumes more than half of the pre-clinical budget. We propose a combined machine learning and molecular modeling approach that automates lead optimization workflow \textit{in silico}. The initial data collection is achieved with physics-based molecular dynamics (MD) simulation. Contact matrices are calculated as the preliminary features extracted from the simulations. To take advantage of the temporal information from the simulations, we enhanced contact matrices data with temporal dynamism representation, which are then modeled with unsupervised convolutional variational autoencoder (CVAE). Finally, conventional clustering method and CVAE-based clustering method are compared with metrics to rank the submolecular structures and propose potential candidates for lead optimization. With no need for extensive structure-activity relationship database, our method provides new hints for drug modification hotspots which can be used to improve drug efficacy. Our workflow can potentially reduce the lead optimization turnaround time from months/years to days compared with the conventional labor-intensive process and thus can potentially become a valuable tool for medical researchers.

LGOct 2, 2019
Accelerating Data Loading in Deep Neural Network Training

Chih-Chieh Yang, Guojing Cong

Data loading can dominate deep neural network training time on large-scale systems. We present a comprehensive study on accelerating data loading performance in large-scale distributed training. We first identify performance and scalability issues in current data loading implementations. We then propose optimizations that utilize CPU resources to the data loader design. We use an analytical model to characterize the impact of data loading on the overall training time and establish the performance trend as we scale up distributed training. Our model suggests that I/O rate limits the scalability of distributed training, which inspires us to design a locality-aware data loading method. By utilizing software caches, our method can drastically reduce the data loading communication volume in comparison with the original data loading implementation. Finally, we evaluate the proposed optimizations with various experiments. We achieved more than 30x speedup in data loading using 256 nodes with 1,024 learners.

LGMar 12, 2019
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction

Fan Zhou, Guojing Cong

Reducing communication in training large-scale machine learning applications on distributed platform is still a big challenge. To address this issue, we propose a distributed hierarchical averaging stochastic gradient descent (Hier-AVG) algorithm with infrequent global reduction by introducing local reduction. As a general type of parallel SGD, Hier-AVG can reproduce several popular synchronous parallel SGD variants by adjusting its parameters. We show that Hier-AVG with infrequent global reduction can still achieve standard convergence rate for non-convex optimization problems. In addition, we show that more frequent local averaging with more participants involved can lead to faster training convergence. By comparing Hier-AVG with another popular distributed training algorithm K-AVG, we show that through deploying local averaging with fewer number of global averaging, Hier-AVG can still achieve comparable training speed while frequently get better test accuracy. This indicates that local averaging can serve as an alternative remedy to effectively reduce communication overhead when the number of learners is large. Experimental results of Hier-AVG with several state-of-the-art deep neural nets on CIFAR-10 and IMAGENET-1K are presented to validate our analysis and show its superiority.

LGAug 3, 2017
On the convergence properties of a $K$-step averaging stochastic gradient descent algorithm for nonconvex optimization

Fan Zhou, Guojing Cong

Despite their popularity, the practical performance of asynchronous stochastic gradient descent methods (ASGD) for solving large scale machine learning problems are not as good as theoretical results indicate. We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG. We establish the convergence results of K-AVG for nonconvex objectives and explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is a special case of K-AVG with $K=1$. We also show that K-AVG scales better than ASGD. Another advantage of K-AVG over ASGD is that it allows larger stepsizes. On a cluster of $128$ GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for \cifar dataset.