LGJun 9, 2022Code
A Unification Framework for Euclidean and Hyperbolic Graph Neural NetworksMehrdad Khatir, Nurendra Choudhary, Sutanay Choudhury et al.
Hyperbolic neural networks can effectively capture the inherent hierarchy of graph datasets, and consequently a powerful choice of GNNs. However, they entangle multiple incongruent (gyro-)vector spaces within a layer, which makes them limited in terms of generalization and scalability. In this work, we propose the Poincare disk model as our search space, and apply all approximations on the disk (as if the disk is a tangent space derived from the origin), thus getting rid of all inter-space transformations. Such an approach enables us to propose a hyperbolic normalization layer and to further simplify the entire hyperbolic model to a Euclidean model cascaded with our hyperbolic normalization layer. We applied our proposed nonlinear hyperbolic normalization to the current state-of-the-art homogeneous and multi-relational graph networks. We demonstrate that our model not only leverages the power of Euclidean networks such as interpretability and efficient execution of various model components, but also outperforms both Euclidean and hyperbolic counterparts on various benchmarks. Our code is made publicly available at https://github.com/oom-debugger/ijcai23.
DCMar 15, 2023
Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous ResourcesLogan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson et al.
Applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelerators. Here, we present our experiences deploying two AI-guided simulation workflows across such heterogeneous systems. A unique aspect of our approach is our use of cloud-hosted management services to manage challenging aspects of cross-resource authentication and authorization, function-as-a-service (FaaS) function invocation, and data transfer. We show that these methods can achieve performance parity with systems that rely on direct connection between resources. We achieve parity by integrating the FaaS system and data transfer capabilities with a system that passes data by reference among managers and workers, and a user-configurable steering algorithm to hide data transfer latencies. We anticipate that this ease of use can enable routine use of heterogeneous resources in computational science.
LGNov 25, 2022
Extreme Acceleration of Graph Neural Network-based Prediction Models for Quantum ChemistryHatem Helal, Jesun Firoz, Jenna Bilbrey et al.
Molecular property calculations are the bedrock of chemical physics. High-fidelity \textit{ab initio} modeling techniques for computing the molecular properties can be prohibitively expensive, and motivate the development of machine-learning models that make the same predictions more efficiently. Training graph neural networks over large molecular databases introduces unique computational challenges such as the need to process millions of small graphs with variable size and support communication patterns that are distinct from learning over large graphs such as social networks. This paper demonstrates a novel hardware-software co-design approach to scale up the training of graph neural networks for molecular property prediction. We introduce an algorithm to coalesce the batches of molecular graphs into fixed size packs to eliminate redundant computation and memory associated with alternative padding techniques and improve throughput via minimizing communication. We demonstrate the effectiveness of our co-design approach by providing an implementation of a well-established molecular property prediction model on the Graphcore Intelligence Processing Units (IPU). We evaluate the training performance on multiple molecular graph databases with varying degrees of graph counts, sizes and sparsity. We demonstrate that such a co-design approach can reduce the training time of such molecular property prediction models from days to less than two hours, opening new possibilities for AI-driven scientific discovery.
AIOct 22, 2023
Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst DesignHenry W. Sprueill, Carl Edwards, Mariefel V. Olarte et al.
Discovering novel catalysts requires complex reasoning involving multiple chemical properties and resultant trade-offs, leading to a combinatorial growth in the search space. While large language models (LLM) have demonstrated novel capabilities for chemistry through complex instruction following capabilities and high quality reasoning, a goal-driven combinatorial search using LLMs has not been explored in detail. In this work, we present a Monte Carlo Tree Search-based approach that improves beyond state-of-the-art chain-of-thought prompting variants to augment scientific reasoning. We introduce two new reasoning datasets: 1) a curation of computational chemistry simulations, and 2) diverse questions written by catalysis researchers for reasoning about novel chemical conversion processes. We improve over the best baseline by 25.8\% and find that our approach can augment scientist's reasoning and discovery process with novel insights.
LGNov 8, 2022
Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI AcceleratorsJenna A. Bilbrey, Kristina M. Herman, Henry Sprueill et al.
The demonstrated success of transfer learning has popularized approaches that involve pretraining models from massive data sources and subsequent finetuning towards a specific task. While such approaches have become the norm in fields such as natural language processing, implementation and evaluation of transfer learning approaches for chemistry are in the early stages. In this work, we demonstrate finetuning for downstream tasks on a graph neural network (GNN) trained over a molecular database containing 2.7 million water clusters. The use of Graphcore IPUs as an AI accelerator for training molecular GNNs reduces training time from a reported 2.7 days on 0.5M clusters to 1.2 hours on 2.7M clusters. Finetuning the pretrained model for downstream tasks of molecular dynamics and transfer to a different potential energy surface took only 8.3 hours and 28 minutes, respectively, on a single GPU.
ARNov 8, 2023
Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUsHongwu Peng, Caiwen Ding, Tong Geng et al.
The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands. Traditional computing architectures, based on the von Neumann model, are being outstripped by the requirements of contemporary AI/ML algorithms, leading to a surge in the creation of accelerators like the Graphcore Intelligence Processing Unit (IPU), Sambanova Reconfigurable Dataflow Unit (RDU), and enhanced GPU platforms. These hardware accelerators are characterized by their innovative data-flow architectures and other design optimizations that promise to deliver superior performance and energy efficiency for AI/ML tasks. This research provides a preliminary evaluation and comparison of these commercial AI/ML accelerators, delving into their hardware and software design features to discern their strengths and unique capabilities. By conducting a series of benchmark evaluations on common DNN operators and other AI/ML workloads, we aim to illuminate the advantages of data-flow architectures over conventional processor designs and offer insights into the performance trade-offs of each platform. The findings from our study will serve as a valuable reference for the design and performance expectations of research prototypes, thereby facilitating the development of next-generation hardware accelerators tailored for the ever-evolving landscape of AI/ML applications. Through this analysis, we aspire to contribute to the broader understanding of current accelerator technologies and to provide guidance for future innovations in the field.
LGMay 24, 2022
TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch ModelsAndrew Engel, Zhichao Wang, Anand D. Sarwate et al.
We introduce torchNTK, a python library to calculate the empirical neural tangent kernel (NTK) of neural network models in the PyTorch framework. We provide an efficient method to calculate the NTK of multilayer perceptrons. We compare the explicit differentiation implementation against autodifferentiation implementations, which have the benefit of extending the utility of the library to any architecture supported by PyTorch, such as convolutional networks. A feature of the library is that we expose the user to layerwise NTK components, and show that in some regimes a layerwise calculation is more memory efficient. We conduct preliminary experiments to demonstrate use cases for the software and probe the NTK.
LGMay 23, 2023Code
Faithful and Efficient Explanations for Neural Networks via Neural Tangent Kernel Surrogate ModelsAndrew Engel, Zhichao Wang, Natalie S. Frank et al.
A recent trend in explainable AI research has focused on surrogate modeling, where neural networks are approximated as simpler ML algorithms such as kernel machines. A second trend has been to utilize kernel functions in various explain-by-example or data attribution tasks. In this work, we combine these two trends to analyze approximate empirical neural tangent kernels (eNTK) for data attribution. Approximation is critical for eNTK analysis due to the high computational cost to compute the eNTK. We define new approximate eNTK and perform novel analysis on how well the resulting kernel machine surrogate models correlate with the underlying neural network. We introduce two new random projection variants of approximate eNTK which allow users to tune the time and memory complexity of their calculation. We conclude that kernel machines using approximate neural tangent kernel as the kernel function are effective surrogate models, with the introduced trace NTK the most consistent performer. Open source software allowing users to efficiently calculate kernel functions in the PyTorch framework is available (https://github.com/pnnl/projection\_ntk).
BMMay 7, 2021Code
Evening the Score: Targeting SARS-CoV-2 Protease Inhibition in Graph Generative Models for Therapeutic CandidatesJenna Bilbrey, Logan Ward, Sutanay Choudhury et al.
We examine a pair of graph generative models for the therapeutic design of novel drug candidates targeting SARS-CoV-2 viral proteins. Due to a sense of urgency, we chose well-validated models with unique strengths: an autoencoder that generates molecules with similar structures to a dataset of drugs with anti-SARS activity and a reinforcement learning algorithm that generates highly novel molecules. During generation, we explore optimization toward several design targets to balance druglikeness, synthetic accessability, and anti-SARS activity based on \icfifty. This generative framework\footnote{https://github.com/exalearn/covid-drug-design} will accelerate drug discovery in future pandemics through the high-throughput generation of targeted therapeutic candidates.
AIFeb 9, 2024
GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment via Neighborhood Partitioning and Generative Subgraph EncodingStefan Dernbach, Khushbu Agarwal, Alejandro Zuniga et al.
Integrating large language models (LLMs) with knowledge graphs derived from domain-specific data represents an important advancement towards more powerful and factual reasoning. As these models grow more capable, it is crucial to enable them to perform multi-step inferences over real-world knowledge graphs while minimizing hallucination. While large language models excel at conversation and text generation, their ability to reason over domain-specialized graphs of interconnected entities remains limited. For example, can we query a LLM to identify the optimal contact in a professional network for a specific goal, based on relationships and attributes in a private database? The answer is no--such capabilities lie beyond current methods. However, this question underscores a critical technical gap that must be addressed. Many high-value applications in areas such as science, security, and e-commerce rely on proprietary knowledge graphs encoding unique structures, relationships, and logical constraints. We introduce a fine-tuning framework for developing Graph-aligned LAnguage Models (GLaM) that transforms a knowledge graph into an alternate text representation with labeled question-answer pairs. We demonstrate that grounding the models in specific graph-based knowledge expands the models' capacity for structure-based reasoning. Our methodology leverages the large-language model's generative capabilities to create the dataset and proposes an efficient alternate to retrieval-augmented generation styled methods.
CHEM-PHFeb 15, 2024
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical FeedbackHenry W. Sprueill, Carl Edwards, Khushbu Agarwal et al.
The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and reaction energy barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automatically guide the exploration without human input, providing competitive performance against expert-enumerated chemical descriptor-based implementations. By integrating language-guided reasoning with computational chemistry feedback, our work pioneers AI-accelerated, trustworthy catalyst discovery.
DCApr 14, 2025
Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACEJesun Firoz, Franco Pellegrini, Mario Geiger et al.
Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scientists. These models facilitate the understanding of matter and the discovery of new molecules and materials. In contrast to GNNs operating on a large homogeneous graphs, GNNs used by CFMs process a large number of geometric graphs of varying sizes, requiring different optimization strategies than those developed for large homogeneous GNNs. This paper presents optimizations for two critical phases of CFM training: data distribution and model training, targeting MACE - a state-of-the-art CFM. We address the challenge of load balancing in data distribution by formulating it as a multi-objective bin packing problem. We propose an iterative algorithm that provides a highly effective, fast, and practical solution, ensuring efficient data distribution. For the training phase, we identify symmetric tensor contraction as the key computational kernel in MACE and optimize this kernel to improve the overall performance. Our combined approach of balanced data distribution and kernel optimization significantly enhances the training process of MACE. Experimental results demonstrate a substantial speedup, reducing per-epoch execution time for training from 12 to 2 minutes on 740 GPUs with a 2.6M sample dataset.
AIDec 10, 2024
Thinking Fast and Laterally: Multi-Agentic Approach for Reasoning about Uncertain Emerging EventsStefan Dernbach, Alejandro Michel, Khushbu Agarwal et al.
This paper introduces lateral thinking to implement System-2 reasoning capabilities in AI systems, focusing on anticipatory and causal reasoning under uncertainty. We present a framework for systematic generation and modeling of lateral thinking queries and evaluation datasets. We introduce Streaming Agentic Lateral Thinking (SALT), a multi-agent framework designed to process complex, low-specificity queries in streaming data environments. SALT implements lateral thinking-inspired System-2 reasoning through a dynamic communication structure between specialized agents. Our key insight is that lateral information flow across long-distance agent interactions, combined with fine-grained belief management, yields richer information contexts and enhanced reasoning. Preliminary quantitative and qualitative evaluations indicate SALT's potential to outperform single-agent systems in handling complex lateral reasoning tasks in a streaming environment.
DCMay 4, 2023
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUsJou-An Chen, Hsin-Hsuan Sung, Xipeng Shen et al.
Recent studies have shown that Binary Graph Neural Networks (GNNs) are promising for saving computations of GNNs through binarized tensors. Prior work, however, mainly focused on algorithm designs or training techniques, leaving it open to how to materialize the performance potential on accelerator hardware fully. This work redesigns the binary GNN inference backend from the efficiency perspective. It fills the gap by proposing a series of abstractions and techniques to map binary GNNs and their computations best to fit the nature of bit manipulations on GPUs. Results on real-world graphs with GCNs, GraphSAGE, and GraphSAINT show that the proposed techniques outperform state-of-the-art binary GNN implementations by 8-22X with the same accuracy maintained. BitGNN code is publicly available.
CLAug 1, 2021
Attention-based Aspect Reasoning for Knowledge Base Question Answering on Clinical NotesPing Wang, Tian Shi, Khushbu Agarwal et al.
Question Answering (QA) in clinical notes has gained a lot of attention in the past few years. Existing machine reading comprehension approaches in clinical domain can only handle questions about a single block of clinical texts and fail to retrieve information about multiple patients and their clinical notes. To handle more complex questions, we aim at creating knowledge base from clinical notes to link different patients and clinical notes, and performing knowledge base question answering (KBQA). Based on the expert annotations available in the n2c2 dataset, we first created the ClinicalKBQA dataset that includes around 9K QA pairs and covers questions about seven medical topics using more than 300 question templates. Then, we investigated an attention-based aspect reasoning (AAR) method for KBQA and analyzed the impact of different aspects of answers (e.g., entity, type, path, and context) for prediction. The AAR method achieves better performance due to the well-designed encoder and attention mechanism. From our experiments, we find that both aspects, type and path, enable the model to identify answers satisfying the general conditions and produce lower precision and higher recall. On the other hand, the aspects, entity and context, limit the answers by node-specific information and lead to higher precision and lower recall.
LGFeb 9, 2021
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19Logan Ward, Jenna A. Bilbrey, Sutanay Choudhury et al.
Design of new drug compounds with target properties is a key area of research in generative modeling. We present a small drug molecule design pipeline based on graph-generative models and a comparison study of two state-of-the-art graph generative models for designing COVID-19 targeted drug candidates: 1) a variational autoencoder-based approach (VAE) that uses prior knowledge of molecules that have been shown to be effective for earlier coronavirus treatments and 2) a deep Q-learning method (DQN) that generates optimized molecules without any proximity constraints. We evaluate the novelty of the automated molecule generation approaches by validating the candidate molecules with drug-protein binding affinity models. The VAE method produced two novel molecules with similar structures to the antiretroviral protease inhibitor Indinavir that show potential binding affinity for the SARS-CoV-2 protein target 3-chymotrypsin-like protease (3CL-protease).
LGNov 30, 2020
HydroNet: Benchmark Tasks for Preserving Intermolecular Interactions and Structural Motifs in Predictive and Generative Models for Molecular DataSutanay Choudhury, Jenna A. Bilbrey, Logan Ward et al.
Intermolecular and long-range interactions are central to phenomena as diverse as gene regulation, topological states of quantum materials, electrolyte transport in batteries, and the universal solvation properties of water. We present a set of challenge problems for preserving intermolecular interactions and structural motifs in machine-learning approaches to chemical problems, through the use of a recently published dataset of 4.95 million water clusters held together by hydrogen bonding interactions and resulting in longer range structural patterns. The dataset provides spatial coordinates as well as two types of graph representations, to accommodate a variety of machine-learning practices.
LGJul 22, 2020
Self-Supervised Learning of Contextual Embeddings for Link Prediction in Heterogeneous NetworksPing Wang, Khushbu Agarwal, Colby Ham et al.
Representation learning methods for heterogeneous networks produce a low-dimensional vector embedding for each node that is typically fixed for all tasks involving the node. Many of the existing methods focus on obtaining a static vector representation for a node in a way that is agnostic to the downstream application where it is being used. In practice, however, downstream tasks such as link prediction require specific contextual information that can be extracted from the subgraphs related to the nodes provided as input to the task. To tackle this challenge, we develop SLiCE, a framework bridging static representation learning methods using global information from the entire graph with localized attention driven mechanisms to learn contextual node representations. We first pre-train our model in a self-supervised manner by introducing higher-order semantic associations and masking nodes, and then fine-tune our model for a specific link prediction task. Instead of training node representations by aggregating information from all semantic neighbors connected via metapaths, we automatically learn the composition of different metapaths that characterize the context for a specific task without the need for any pre-defined metapaths. SLiCE significantly outperforms both static and contextual embedding learning methods on several publicly available benchmark network datasets. We also interpret the semantic association matrix and provide its utility and relevance in making successful link predictions between heterogeneous nodes in the network.
LGJul 19, 2019
Snomed2Vec: Random Walk and Poincaré Embeddings of a Clinical Knowledge Base for Healthcare AnalyticsKhushbu Agarwal, Tome Eftimov, Raghavendra Addanki et al.
Representation learning methods that transform encoded data (e.g., diagnosis and drug codes) into continuous vector spaces (i.e., vector embeddings) are critical for the application of deep learning in healthcare. Initial work in this area explored the use of variants of the word2vec algorithm to learn embeddings for medical concepts from electronic health records or medical claims datasets. We propose learning embeddings for medical concepts by using graph-based representation learning methods on SNOMED-CT, a widely popular knowledge graph in the healthcare domain with numerous operational and research applications. Current work presents an empirical analysis of various embedding methods, including the evaluation of their performance on multiple tasks of biomedical relevance (node classification, link prediction, and patient state prediction). Our results show that concept embeddings derived from the SNOMED-CT knowledge graph significantly outperform state-of-the-art embeddings, showing 5-6x improvement in ``concept similarity" and 6-20\% improvement in patient diagnosis.
CRMay 3, 2019
Enterprise Cyber Resiliency Against Lateral Movement: A Graph Theoretic ApproachPin-Yu Chen, Sutanay Choudhury, Luke Rodriguez et al.
Lateral movement attacks are a serious threat to enterprise security. In these attacks, an attacker compromises a trusted user account to get a foothold into the enterprise network and uses it to attack other trusted users, increasingly gaining higher and higher privileges. Such lateral attacks are very hard to model because of the unwitting role that users play in the attack and even harder to detect and prevent because of their low and slow nature. In this paper, a theoretical framework is presented for modeling lateral movement attacks and for proposing a methodology for designing resilient cyber systems against such attacks. The enterprise is modeled as a tripartite graph capturing the interaction between users, machines, and applications, and a set of procedures is proposed to harden the network by increasing the cost of lateral movement. Strong theoretical guarantees on system resilience are established and experimentally validated for large enterprise networks.
LGJan 14, 2016
Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge GraphsBaichuan Zhang, Sutanay Choudhury, Mohammad Al Hasan et al.
Link prediction, or predicting the likelihood of a link in a knowledge graph based on its existing state is a key research task. It differs from a traditional link prediction task in that the links in a knowledge graph are categorized into different predicates and the link prediction performance of different predicates in a knowledge graph generally varies widely. In this work, we propose a latent feature embedding based link prediction model which considers the prediction task for each predicate disjointly. To learn the model parameters it utilizes a Bayesian personalized ranking based optimization technique. Experimental results on large-scale knowledge bases such as YAGO2 show that our link prediction approach achieves substantially higher performance than several state-of-art approaches. We also show that for a given predicate the topological properties of the knowledge graph induced by the given predicate edges are key indicators of the link prediction performance of that predicate in the knowledge graph.
SIDec 23, 2015
Multi-centrality Graph Spectral Decompositions and their Application to Cyber Intrusion DetectionPin-Yu Chen, Sutanay Choudhury, Alfred O. Hero
Many modern datasets can be represented as graphs and hence spectral decompositions such as graph principal component analysis (PCA) can be useful. Distinct from previous graph decomposition approaches based on subspace projection of a single topological feature, e.g., the Fiedler vector of centered graph adjacency matrix (graph Laplacian), we propose spectral decomposition approaches to graph PCA and graph dictionary learning that integrate multiple features, including graph walk statistics, centrality measures and graph distances to reference nodes. In this paper we propose a new PCA method for single graph analysis, called multi-centrality graph PCA (MC-GPCA), and a new dictionary learning method for ensembles of graphs, called multi-centrality graph dictionary learning (MC-GDL), both based on spectral decomposition of multi-centrality matrices. As an application to cyber intrusion detection, MC-GPCA can be an effective indicator of anomalous connectivity pattern and MC-GDL can provide discriminative basis for attack classification.
CLJul 14, 2014
Benchmarking Named Entity Disambiguation approaches for Streaming GraphsSutanay Choudhury, Chase Dowling
Named Entity Disambiaguation (NED) is a central task for applications dealing with natural language text. Assume that we have a graph based knowledge base (subsequently referred as Knowledge Graph) where nodes represent various real world entities such as people, location, organization and concepts. Given data sources such as social media streams and web pages Entity Linking is the task of mapping named entities that are extracted from the data to those present in the Knowledge Graph. This is an inherently difficult task due to several reasons. Almost all these data sources are generated without any formal ontology; the unstructured nature of the input, limited context and the ambiguity involved when multiple entities are mapped to the same name make this a hard task. This report looks at two state of the art systems employing two distinctive approaches: graph based Accurate Online Disambiguation of Entities (AIDA) and Mined Evidence Named Entity Disambiguation (MENED), which employs a statistical inference approach. We compare both approaches using the data set and queries provided by the Knowledge Base Population (KBP) track at 2011 NIST Text Analytics Conference (TAC). This report begins with an overview of the respective approaches, followed by detailed description of the experimental setup. It concludes with our findings from the benchmarking exercise.
CRApr 24, 2013
Towards a Networks-of-Networks Framework for Cyber SecurityMahantesh Halappanavar, Sutanay Choudhury, Emilie Hogan et al.
Networks-of-networks (NoN) is a graph-theoretic model of interdependent networks that have distinct dynamics at each network (layer). By adding special edges to represent relationships between nodes in different layers, NoN provides a unified mechanism to study interdependent systems intertwined in a complex relationship. While NoN based models have been proposed for cyber-physical systems, in this position paper we build towards a three-layered NoN model for an enterprise cyber system. Each layer captures a different facet of a cyber system. We present in-depth discussion for four major graph- theoretic applications to demonstrate how the three-layered NoN model can be leveraged for continuous system monitoring and mission assurance.