ARFeb 13, 2023Code
OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for Experimental ScienceMaksim Levental, Arham Khan, Ryan Chard et al.
In many experiment-driven scientific domains, such as high-energy physics, material science, and cosmology, high data rate experiments impose hard constraints on data acquisition systems: collected data must either be indiscriminately stored for post-processing and analysis, thereby necessitating large storage capacity, or accurately filtered in real-time, thereby necessitating low-latency processing. Deep neural networks, effective in other filtering tasks, have not been widely employed in such data acquisition systems, due to design and deployment difficulties. We present an open source, lightweight, compiler framework, without any proprietary dependencies, OpenHLS, based on high-level synthesis techniques, for translating high-level representations of deep neural networks to low-level representations, suitable for deployment to near-sensor devices such as field-programmable gate arrays. We evaluate OpenHLS on various workloads and present a case-study implementation of a deep neural network for Bragg peak detection in the context of high-energy diffraction microscopy. We show OpenHLS is able to produce an implementation of the network with a throughput 4.8 $μ$s/sample, which is approximately a 4$\times$ improvement over the existing implementation
AIJul 1, 2022
FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopyNikil Ravi, Pranshu Chaturvedi, E. A. Huerta et al.
A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data and AI models within a unified computational framework combining the following elements: the Advanced Photon Source at Argonne National Laboratory, the Materials Data Facility, the Data and Learning Hub for Science, and funcX, and the Argonne Leadership Computing Facility (ALCF), in particular the ThetaGPU supercomputer and the SambaNova DataScale system at the ALCF AI Testbed. We describe how this domain-agnostic computational framework may be harnessed to enable autonomous AI-driven discovery.
LGAug 17, 2023
APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a ServiceZilinghan Li, Shilan He, Pranshu Chaturvedi et al.
Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to collaboratively train robust and generalized machine learning (ML) models without sharing sensitive (e.g., healthcare of financial) local data. To ease and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use platform that provides privacy-preserving cross-silo federated learning as a service. APPFLx employs Globus authentication to allow users to easily and securely invite trustworthy collaborators for PPFL, implements several synchronous and asynchronous FL algorithms, streamlines the FL experiment launch process, and enables tracking and visualizing the life cycle of FL experiments, allowing domain experts and ML practitioners to easily orchestrate and evaluate cross-silo FL under one platform. APPFLx is available online at https://appflx.link
DCAug 25, 2023
Linking the Dynamic PicoProbe Analytical Electron-Optical Beam Line / Microscope to SupercomputersAlexander Brace, Rafael Vescovi, Ryan Chard et al.
The Dynamic PicoProbe at Argonne National Laboratory is undergoing upgrades that will enable it to produce up to 100s of GB of data per day. While this data is highly important for both fundamental science and industrial applications, there is currently limited on-site infrastructure to handle these high-volume data streams. We address this problem by providing a software architecture capable of supporting large-scale data transfers to the neighboring supercomputers at the Argonne Leadership Computing Facility. To prepare for future scientific workflows, we implement two instructive use cases for hyperspectral and spatiotemporal datasets, which include: (i) off-site data transfer, (ii) machine learning/artificial intelligence and traditional data analysis approaches, and (iii) automatic metadata extraction and cataloging of experimental results. This infrastructure supports expected workloads and also provides domain scientists the ability to reinterrogate data from past experiments to yield additional scientific value and derive new insights.
DCAug 19, 2022
Globus Automation Services: Research process automation across the space-time continuumRyan Chard, Jim Pruyne, Kurt McKee et al.
Research process automation -- the reliable, efficient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources -- has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of actions, \emph{flows}, and the execution of such flows in heterogeneous research environments. To support flows with broad spatial extent (e.g., from scientific instrument to remote data center) and temporal extent (from seconds to weeks), these Globus automation services feature: 1) cloud hosting for reliable execution of even long-lived flows despite sporadic failures; 2) a simple specification and extensible asynchronous action provider API, for defining and executing a wide variety of actions and flows involving heterogeneous resources; 3) an event-driven execution model for automating execution of flows in response to arbitrary events; and 4) a rich security model enabling authorization delegation mechanisms for secure execution of long-running actions across distributed resources. These services permit researchers to outsource and automate the management of a broad range of research tasks to a reliable, scalable, and secure cloud platform. We present use cases for Globus automation services, describe their design and implementation, present microbenchmark studies, and review experiences applying the services in a range of applications.
LGSep 20, 2022
Deep learning at the edge enables real-time streaming ptychographic imagingAnakha V Babu, Tao Zhou, Saugat Kandel et al.
Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials characterization. However, associated significant increases in data and compute needs mean that conventional approaches no longer suffice for recovering sample images in real-time from high-speed coherent imaging experiments. Here, we demonstrate a workflow that leverages artificial intelligence at the edge and high-performance computing to enable real-time inversion on X-ray ptychography data streamed directly from a detector at up to 2 kHz. The proposed AI-enabled workflow eliminates the sampling constraints imposed by traditional ptychography, allowing low dose imaging using orders of magnitude less data than required by traditional methods.
DCMar 15, 2023
Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous ResourcesLogan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson et al.
Applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelerators. Here, we present our experiences deploying two AI-guided simulation workflows across such heterogeneous systems. A unique aspect of our approach is our use of cloud-hosted management services to manage challenging aspects of cross-resource authentication and authorization, function-as-a-service (FaaS) function invocation, and data transfer. We show that these methods can achieve performance parity with systems that rely on direct connection between resources. We achieve parity by integrating the FaaS system and data transfer capabilities with a system that passes data by reference among managers and workers, and a user-configurable steering algorithm to hide data transfer latencies. We anticipate that this ease of use can enable routine use of heterogeneous resources in computational science.
CVApr 9, 2023
AI-assisted Automated Workflow for Real-time X-ray Ptychography Data Analysis via Federated ResourcesAnakha V Babu, Tekin Bicer, Saugat Kandel et al.
We present an end-to-end automated workflow that uses large-scale remote compute resources and an embedded GPU platform at the edge to enable AI/ML-accelerated real-time analysis of data collected for x-ray ptychography. Ptychography is a lensless method that is being used to image samples through a simultaneous numerical inversion of a large number of diffraction patterns from adjacent overlapping scan positions. This acquisition method can enable nanoscale imaging with x-rays and electrons, but this often requires very large experimental datasets and commensurately high turnaround times, which can limit experimental capabilities such as real-time experimental steering and low-latency monitoring. In this work, we introduce a software system that can automate ptychography data analysis tasks. We accelerate the data analysis pipeline by using a modified version of PtychoNN -- an ML-based approach to solve phase retrieval problem that shows two orders of magnitude speedup compared to traditional iterative methods. Further, our system coordinates and overlaps different data analysis tasks to minimize synchronization overhead between different stages of the workflow. We evaluate our workflow system with real-world experimental workloads from the 26ID beamline at Advanced Photon Source and ThetaGPU cluster at Argonne Leadership Computing Resources.
DCAug 26, 2024
Employing Artificial Intelligence to Steer Exascale Workflows with ColmenaLogan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson et al.
Computational workflows are a common class of application on supercomputers, yet the loosely coupled and heterogeneous nature of workflows often fails to take full advantage of their capabilities. We created Colmena to leverage the massive parallelism of a supercomputer by using Artificial Intelligence (AI) to learn from and adapt a workflow as it executes. Colmena allows scientists to define how their application should respond to events (e.g., task completion) as a series of cooperative agents. In this paper, we describe the design of Colmena, the challenges we overcame while deploying applications on exascale systems, and the science workflows we have enhanced through interweaving AI. The scaling challenges we discuss include developing steering strategies that maximize node utilization, introducing data fabrics that reduce communication overhead of data-intensive tasks, and implementing workflow tasks that cache costly operations between invocations. These innovations coupled with a variety of application patterns accessible through our agent-based steering model have enabled science advances in chemistry, biophysics, and materials science using different types of AI. Our vision is that Colmena will spur creative solutions that harness AI across many domains of scientific computing.
LGSep 24, 2024
Flight: A FaaS-Based Framework for Complex and Hierarchical Federated LearningNathaniel Hudson, Valerie Hayot-Sasson, Yadu Babuji et al.
Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed systems like the Internet-of-Things. We present Flight, a novel FL framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decouples the control plane from the data plane. We compare the performance of Flight against Flower, a state-of-the-art FL framework. Our results show that Flight scales beyond Flower, supporting up to 2048 simultaneous devices, and reduces FL makespan across several models. Finally, we show that Flight's hierarchical FL model can reduce communication overheads by more than 60%.
DCApr 11
Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File SystemsHaochen Pan, Ryan Chard, Song Young Oh et al.
Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer. Traditional file system utilities such as find and du fail to scale to these sizes. While external indexing tools like GUFI and Brindexer improve query performance, they remain batch-oriented and unsuitable for heterogeneous, rapidly evolving environments. We present Icicle, a scalable framework for continuous file system metadata indexing and monitoring. Icicle maintains a unified, up-to-date, and queryable view of file system state while supporting both periodic snapshot-based ingestion for bulk metadata updates and event-based ingestion for real-time synchronization from production systems such as Lustre and IBM Storage Scale. Built on Apache Kafka and Apache Flink, Icicle provides high-throughput, fault-tolerant, and horizontally scalable ingestion of metadata events into two complementary search indexes, enabling both individual file discovery and aggregate summary statistics by user, group, and directory. This architecture enables efficient support for both coarse-grained administrative queries and interactive analytics over billions of objects. Our experimental evaluation on production-scale HPC datasets demonstrates order-of-magnitude throughput improvements over existing monitoring and indexing approaches, with tunable options for balancing consistency, latency, and metadata freshness.
DCOct 6, 2021Code
Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance ComputingLogan Ward, Ganesh Sivaraman, J. Gregory Pauloski et al.
Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We present Colmena, an open-source Python framework that allows users to steer campaigns by providing just the implementations of individual tasks plus the logic used to choose which tasks to execute when. Colmena handles task dispatch, results collation, ML model invocation, and ML model (re)training, using Parsl to execute tasks on HPC systems. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.
LGFeb 5, 2024
Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and VisionNathaniel Hudson, J. Gregory Pauloski, Matt Baughman et al.
Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$Σ$. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
DCOct 15, 2025
FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model AccessAditya Tanikanti, Benoit Côté, Yanfei Guo et al.
We present the Federated Inference Resource Scheduling Toolkit (FIRST), a framework enabling Inference-as-a-Service across distributed High-Performance Computing (HPC) clusters. FIRST provides cloud-like access to diverse AI models, like Large Language Models (LLMs), on existing HPC infrastructure. Leveraging Globus Auth and Globus Compute, the system allows researchers to run parallel inference workloads via an OpenAI-compliant API on private, secure environments. This cluster-agnostic API allows requests to be distributed across federated clusters, targeting numerous hosted models. FIRST supports multiple inference backends (e.g., vLLM), auto-scales resources, maintains "hot" nodes for low-latency execution, and offers both high-throughput batch and interactive modes. The framework addresses the growing demand for private, secure, and scalable AI inference in scientific workflows, allowing researchers to generate billions of tokens daily on-premises without relying on commercial cloud infrastructure.
IVAug 26, 2021
Ultrafast Focus Detection for Automated MicroscopyMaksim Levental, Ryan Chard, Kyle Chard et al.
Technological advancements in modern scientific instruments, such as scanning electron microscopes (SEMs), have significantly increased data acquisition rates and image resolutions enabling new questions to be explored; however, the resulting data volumes and velocities, combined with automated experiments, are quickly overwhelming scientists as there remain crucial steps that require human intervention, for example reviewing image focus. We present a fast out-of-focus detection algorithm for electron microscopy images collected serially and demonstrate that it can be used to provide near-real-time quality control for neuroscience workflows. Our technique, \textit{Multi-scale Histologic Feature Detection}, adapts classical computer vision techniques and is based on detecting various fine-grained histologic features. We exploit the inherent parallelism in the technique to employ GPU primitives in order to accelerate characterization. We show that our method can detect of out-of-focus conditions within just 20ms. To make these capabilities generally available, we deploy our feature detector as an on-demand service and show that it can be used to determine the degree of focus in approximately 230ms, enabling near-real-time use.
GR-QCDec 15, 2020
Accelerated, Scalable and Reproducible AI-driven Gravitational Wave DetectionE. A. Huerta, Asad Khan, Xiaobo Huang et al.
The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware Accelerated Learning (HAL) cluster, using funcX as a universal distributed computing service. Using this workflow, an ensemble of four openly available AI models can be run on HAL to process an entire month's worth (August 2017) of advanced Laser Interferometer Gravitational-Wave Observatory data in just seven minutes, identifying all four all four binary black hole mergers previously identified in this dataset and reporting no misclassifications. This approach combines advances in AI, distributed computing, and scientific data infrastructure to open new pathways to conduct reproducible, accelerated, data-driven discovery.
CVOct 16, 2020
Towards Online Steering of Flame Spray Pyrolysis Nanoparticle SynthesisMaksim Levental, Ryan Chard, Joseph A. Libera et al.
Flame Spray Pyrolysis (FSP) is a manufacturing technique to mass produce engineered nanoparticles for applications in catalysis, energy materials, composites, and more. FSP instruments are highly dependent on a number of adjustable parameters, including fuel injection rate, fuel-oxygen mixtures, and temperature, which can greatly affect the quality, quantity, and properties of the yielded nanoparticles. Optimizing FSP synthesis requires monitoring, analyzing, characterizing, and modifying experimental conditions.Here, we propose a hybrid CPU-GPU Difference of Gaussians (DoG)method for characterizing the volume distribution of unburnt solution, so as to enable near-real-time optimization and steering of FSP experiments. Comparisons against standard implementations show our method to be an order of magnitude more efficient. This surrogate signal can be deployed as a component of an online end-to-end pipeline that maximizes the synthesis yield.
BMMay 28, 2020
Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data ReleaseYadu Babuji, Ben Blaiszik, Tom Brettin et al.
Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort, we are aggregating numerous small molecules from a variety of sources, using high-performance computing (HPC) to computer diverse properties of those molecules, using the computed properties to train ML/AI models, and then using the resulting models for screening. In this first data release, we make available 23 datasets collected from community sources representing over 4.2 B molecules enriched with pre-computed: 1) molecular fingerprints to aid similarity searches, 2) 2D images of molecules to enable exploration and application of image-based deep learning methods, and 3) 2D and 3D molecular descriptors to speed development of machine learning models. This data release encompasses structural information on the 4.2 B molecules and 60 TB of pre-computed data. Future releases will expand the data to include more detailed molecular simulations, computed models, and other products.
LGNov 27, 2018
DLHub: Model and Data Serving for ScienceRyan Chard, Zhuozhao Li, Kyle Chard et al.
While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the "learning systems" needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities with a focus on science applications. DLHub addresses two significant shortcomings in current systems. First, its selfservice model repository allows users to share, publish, verify, reproduce, and reuse models, and addresses concerns related to model reproducibility by packaging and distributing models and all constituent components. Second, it implements scalable and low-latency serving capabilities that can leverage parallel and distributed computing resources to democratize access to published models through a simple web interface. Unlike other model serving frameworks, DLHub can store and serve any Python 3-compatible model or processing function, plus multiple-function pipelines. We show that relative to other model serving systems including TensorFlow Serving, SageMaker, and Clipper, DLHub provides greater capabilities, comparable performance without memoization and batching, and significantly better performance when the latter two techniques can be employed. We also describe early uses of DLHub for scientific applications.