IVSep 3, 2022Code
Masked Sinogram Model with Transformer for ill-Posed Computed Tomography Reconstruction: a Preliminary StudyZhengchun Liu, Rajkumar Kettimuthu, Ian Foster
Computed Tomography (CT) is an imaging technique where information about an object are collected at different angles (called projections or scans). Then the cross-sectional image showing the internal structure of the slice is produced by solving an inverse problem. Limited by certain factors such as radiation dosage, projection angles, the produced images can be noisy or contain artifacts. Inspired by the success of transformer for natural language processing, the core idea of this preliminary study is to consider a projection of tomography as a word token, and the whole scan of the cross-section (A.K.A. sinogram) as a sentence in the context of natural language processing. Then we explore the idea of foundation model by training a masked sinogram model (MSM) and fine-tune MSM for various downstream applications including CT reconstruction under data collections restriction (e.g., photon-budget) and a data-driven solution to approximate solutions of the inverse problem for CT reconstruction. Models and data used in this study are available at https://github.com/lzhengchun/TomoTx.
NIApr 27Code
Beyond Assumptions: Measuring Federated Learning over Real 5G NetworksRobert J. Hayek, Kayla Comer, Joaquin Chung et al.
Deploying FL using IoT devices is an area poised to significantly benefit from advances in NextG wireless. In this paper, we deploy a FL application using a 5G-NR Standalone (SA) testbed with open-source and Commercial Off-the-Shelf (COTS) components. The 5G testbed architecture consists of a network of resource-constrained edge devices, namely Raspberry Pis, and a central server equipped with a Software Defined Radio (SDR) and running O-RAN software. Our testbed allows edge devices to communicate with the server using WiFi and Ethernet in addition to 5G. FL is deployed using the Flower FL framework, extended with custom instrumentation for communication and ML metrics. We analyze the FL application across three network interfaces--5G, WiFi, and Ethernet--as well as across 5G bandwidths and uplink-downlink scheduling ratios. Our experimental results challenge some common assumptions about communication time in FL over wireless and discuss the potential pitfalls of these assumptions. We find that there is a consistent straggler in about 70% of trials, while in the other 30%, high communication time causes competing stragglers. We also compare FL performance over 5G with and without external congestion and compare our testbed to commercial 5G to validate our findings in a broader context. For reproducibility, we have open-sourced our FL application, instrumentation tools, and testbed configuration.
QUANT-PHNov 17, 2024
Simulation of Entanglement-Enabled Connectivity in QLANs using SeQUeNCeFrancesco Mazza, Caitao Zhan, Joaquin Chung et al.
Quantum Local Area Networks (QLANs) represent a promising building block for larger scale quantum networks with the ambitious goal -- in a long time horizon -- of realizing a Quantum Internet. Surprisingly, the physical topology of a QLAN can be enriched by a set of artificial links, enabled by shared multipartite entangled states among the nodes of the network. This novel concept of artificial topology revolutionizes the possibilities of connectivity within the local network, enabling an on-demand manipulation of the artificial network topology. In this paper, we discuss the implementation of the QLAN model in SeQUeNCe, a discrete-event simulator of quantum networks. Specifically, we provide an analysis of how network nodes interact, with an emphasis on the interplay between quantum operations and classical signaling within the network. Remarkably, through the modeling of a measurement protocol and a correction protocol, our QLAN model implementation enables the simulation of the manipulation process of a shared entangled quantum state, and the subsequent engineering of the entanglement-based connectivity. Our simulations demonstrate how to obtain different virtual topologies with different manipulations of the shared resources and with all the possible measurement outcomes, with an arbitrary number of nodes within the network.
LGApr 20, 2022
fairDMS: Rapid Model Training by Data and Model ReuseAhsan Ali, Hemant Sharma, Rajkumar Kettimuthu et al.
Extracting actionable information rapidly from data produced by instruments such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming ever more challenging due to high (up to TB/s) data rates. Conventional physics-based information retrieval methods are hard-pressed to detect interesting events fast enough to enable timely focusing on a rare event or correction of an error. Machine learning~(ML) methods that learn cheap surrogate classifiers present a promising alternative, but can fail catastrophically when changes in instrument or sample result in degradation in ML performance. To overcome such difficulties, we present a new data storage and ML model training architecture designed to organize large volumes of data and models so that when model degradation is detected, prior models and/or data can be queried rapidly and a more suitable model retrieved and fine-tuned for new conditions. We show that our approach can achieve up to 100x data labelling speedup compared to the current state-of-the-art, 200x improvement in training speed, and 92x speedup in-terms of end-to-end model updating time.
QUANT-PHMar 15
InterQnet: A Heterogeneous Full-Stack Approach to Co-designing Scalable Quantum NetworksJoaquin Chung, Daniel Dilley, Ely Eastman et al.
Quantum communications have progressed significantly, moving from a theoretical concept to small-scale experiments to recent metropolitan-scale demonstrations. As the technology matures, it is expected to revolutionize quantum computing in much the same way that classical networks revolutionized classical computing. Quantum communications will also enable breakthroughs in quantum sensing, metrology, and other areas. However, scalability has emerged as a major challenge, particularly in terms of the number and heterogeneity of nodes, the distances between nodes, the diversity of applications, and the scale of user demand. This paper describes InterQnet, a multidisciplinary project that advances scalable quantum communications through a comprehensive approach that improves devices, error handling, and network architecture. InterQnet has a two-pronged strategy to address scalability challenges: InterQnet-Achieve focuses on practical realizations of heterogeneous quantum networks by building and then integrating first-generation quantum repeaters with error mitigation schemes and centralized automated network control systems. The resulting system will enable quantum communications between two heterogeneous quantum platforms through a third type of platform operating as a repeater node. InterQnet-Scale focuses on a systems study of architectural choices for scalable quantum networks by developing forward-looking models of quantum network devices, advanced error correction schemes, and entanglement protocols. Here we report our current progress toward achieving our scalability goals.
ARDec 4, 2025
DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMsZiyu Hu, Zhiqing Zhong, Weijian Zheng et al.
The exponential growth of large language models has outpaced the capabilities of traditional CPU and GPU architectures due to the slowdown of Moore's Law. Dataflow AI accelerators present a promising alternative; however, there remains a lack of in-depth performance analysis and standardized benchmarking methodologies for LLM training. We introduce DABench-LLM, the first benchmarking framework designed for evaluating LLM workloads on dataflow-based accelerators. By combining intra-chip performance profiling and inter-chip scalability analysis, DABench-LLM enables comprehensive evaluation across key metrics such as resource allocation, load balance, and resource efficiency. The framework helps researchers rapidly gain insights into underlying hardware and system behaviors, and provides guidance for performance optimizations. We validate DABench-LLM on three commodity dataflow accelerators, Cerebras WSE-2, SambaNova RDU, and Graphcore IPU. Our framework reveals performance bottlenecks and provides specific optimization strategies, demonstrating its generality and effectiveness across a diverse range of dataflow-based AI hardware platforms.
LGDec 7, 2023
Rapid detection of rare events from in situ X-ray diffraction data using machine learningWeijian Zheng, Jun-Sang Park, Peter Kenesei et al.
High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs of traditional data acquisition and reduction approaches pose a barrier to quickly extracting actionable insights and improving the temporal resolution of these snapshots. Here we present a fully automated technique capable of rapidly detecting the onset of plasticity in high-energy X-ray microscopy data. Our technique is computationally faster by at least 50 times than the traditional approaches and works for data sets that are up to 9 times sparser than a full data set. This new technique leverages self-supervised image representation learning and clustering to transform massive data into compact, semantic-rich representations of visually salient characteristics (e.g., peak shapes). These characteristics can be a rapid indicator of anomalous events such as changes in diffraction peak shapes. We anticipate that this technique will provide just-in-time actionable information to drive smarter experiments that effectively deploy multi-modal X-ray diffraction methods that span many decades of length scales.
LGOct 3, 2025
Diffusion-Based, Data-Assimilation-Enabled Super-Resolution of Hub-height WindsXiaolong Ma, Xu Dong, Ashley Tarrant et al.
High-quality observations of hub-height winds are valuable but sparse in space and time. Simulations are widely available on regular grids but are generally biased and too coarse to inform wind-farm siting or to assess extreme-weather-related risks (e.g., gusts) at infrastructure scales. To fully utilize both data types for generating high-quality, high-resolution hub-height wind speeds (tens to ~100m above ground), this study introduces WindSR, a diffusion model with data assimilation for super-resolution downscaling of hub-height winds. WindSR integrates sparse observational data with simulation fields during downscaling using state-of-the-art diffusion models. A dynamic-radius blending method is introduced to merge observations with simulations, providing conditioning for the diffusion process. Terrain information is incorporated during both training and inference to account for its role as a key driver of winds. Evaluated against convolutional-neural-network and generative-adversarial-network baselines, WindSR outperforms them in both downscaling efficiency and accuracy. Our data assimilation reduces WindSR's model bias by approximately 20% relative to independent observations.
CVJan 24, 2025
Effective Defect Detection Using Instance Segmentation for NDIAshiqur Rahman, Venkata Devesh Reddy Seethi, Austin Yunker et al.
Ultrasonic testing is a common Non-Destructive Inspection (NDI) method used in aerospace manufacturing. However, the complexity and size of the ultrasonic scans make it challenging to identify defects through visual inspection or machine learning models. Using computer vision techniques to identify defects from ultrasonic scans is an evolving research area. In this study, we used instance segmentation to identify the presence of defects in the ultrasonic scan images of composite panels that are representative of real components manufactured in aerospace. We used two models based on Mask-RCNN (Detectron 2) and YOLO 11 respectively. Additionally, we implemented a simple statistical pre-processing technique that reduces the burden of requiring custom-tailored pre-processing techniques. Our study demonstrates the feasibility and effectiveness of using instance segmentation in the NDI pipeline by significantly reducing data pre-processing time, inspection time, and overall costs.
DCJun 22, 2021
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer NodesZhengchun Liu, Rajkumar Kettimuthu, Michael E. Papka et al.
Supercomputer FCFS-based scheduling policies result in many transient idle nodes, a phenomenon that is only partially alleviated by backfill scheduling methods that promote small jobs to run before large jobs. Here we describe how to realize a novel use for these otherwise wasted resources, namely, deep neural network (DNN) training. This important workload is easily organized as many small fragments that can be configured dynamically to fit essentially any node*time hole in a supercomputer's schedule. We describe how the task of rescaling suitable DNN training tasks to fit dynamically changing holes can be formulated as a deterministic mixed integer linear programming (MILP)-based resource allocation algorithm, and show that this MILP problem can be solved efficiently at run time. We show further how this MILP problem can be adapted to optimize for administrator- or user-defined metrics. We validate our method with supercomputer scheduler logs and different DNN training scenarios, and demonstrate efficiencies of up to 93% compared with running the same training tasks on dedicated nodes. Our method thus enables substantial supercomputer resources to be allocated to DNN training with no impact on other applications.
LGJan 18, 2021
Fast and accurate learned multiresolution dynamical downscaling for precipitationJiali Wang, Zhengchun Liu, Ian Foster et al.
This study develops a neural network-based approach for emulating high-resolution modeled precipitation data with comparable statistical properties but at greatly reduced computational cost. The key idea is to use combination of low- and high- resolution simulations to train a neural network to map from the former to the latter. Specifically, we define two types of CNNs, one that stacks variables directly and one that encodes each variable before stacking, and we train each CNN type both with a conventional loss function, such as mean square error (MSE), and with a conditional generative adversarial network (CGAN), for a total of four CNN variants. We compare the four new CNN-derived high-resolution precipitation results with precipitation generated from original high resolution simulations, a bilinear interpolater and the state-of-the-art CNN-based super-resolution (SR) technique. Results show that the SR technique produces results similar to those of the bilinear interpolator with smoother spatial and temporal distributions and smaller data variabilities and extremes than the original high resolution simulations. While the new CNNs trained by MSE generate better results over some regions than the interpolator and SR technique do, their predictions are still not as close as the original high resolution simulations. The CNNs trained by CGAN generate more realistic and physically reasonable results, better capturing not only data variability in time and space but also extremes such as intense and long-lasting storms. The new proposed CNN-based downscaling approach can downscale precipitation from 50~km to 12~km in 14~min for 30~years once the network is trained (training takes 4~hours using 1~GPU), while the conventional dynamical downscaling would take 1~month using 600 CPU cores to generate simulations at the resolution of 12~km over contiguous United States.
IVAug 18, 2020
BraggNN: Fast X-ray Bragg Peak Analysis Using Deep LearningZhengchun Liu, Hemant Sharma, Jun-Sang Park et al.
X-ray diffraction based microscopy techniques such as High Energy Diffraction Microscopy rely on knowledge of the position of diffraction peaks with high precision. These positions are typically computed by fitting the observed intensities in area detector data to a theoretical peak shape such as pseudo-Voigt. As experiments become more complex and detector technologies evolve, the computational cost of such peak detection and shape fitting becomes the biggest hurdle to the rapid analysis required for real-time feedback during in-situ experiments. To this end, we propose BraggNN, a deep learning-based method that can determine peak positions much more rapidly than conventional pseudo-Voigt peak fitting. When applied to a test dataset, BraggNN gives errors of less than 0.29 and 0.57 pixels, relative to the conventional method, for 75% and 95% of the peaks, respectively. When applied to a real experimental dataset, a 3D reconstruction that used peak positions computed by BraggNN yields 15% better results on average as compared to a reconstruction obtained using peak positions determined using conventional 2D pseudo-Voigt fitting. Recent advances in deep learning method implementations and special-purpose model inference accelerators allow BraggNN to deliver enormous performance improvements relative to the conventional method, running, for example, more than 200 times faster than a conventional method on a consumer-class GPU card with out-of-the-box software.
IVNov 12, 2019
Scientific Image Restoration AnywhereVibhatha Abeykoon, Zhengchun Liu, Rajkumar Kettimuthu et al.
The use of deep learning models within scientific experimental facilities frequently requires low-latency inference, so that, for example, quality control operations can be performed while data are being collected. Edge computing devices can be useful in this context, as their low cost and compact form factor permit them to be co-located with the experimental apparatus. Can such devices, with their limited resources, can perform neural network feed-forward computations efficiently and effectively? We explore this question by evaluating the performance and accuracy of a scientific image restoration model, for which both model input and output are images, on edge computing devices. Specifically, we evaluate deployments of TomoGAN, an image-denoising model based on generative adversarial networks developed for low-dose x-ray imaging, on the Google Edge TPU and NVIDIA Jetson. We adapt TomoGAN for edge execution, evaluate model inference performance, and propose methods to address the accuracy drop caused by model quantization. We show that these edge computing devices can deliver accuracy comparable to that of a full-fledged CPU or GPU model, at speeds that are more than adequate for use in the intended deployments, denoising a 1024 x 1024 image in less than a second. Our experiments also show that the Edge TPU models can provide 3x faster inference response than a CPU-based model and 1.5x faster than an edge GPU-based model. This combination of high speed and low cost permits image restoration anywhere.
IVOct 9, 2019
Deep Learning Accelerated Light Source ExperimentsZhengchun Liu, Tekin Bicer, Rajkumar Kettimuthu et al.
Experimental protocols at synchrotron light sources typically process and validate data only after an experiment has completed, which can lead to undetected errors and cannot enable online steering. Real-time data analysis can enable both detection of, and recovery from, errors, and optimization of data acquisition. However, modern scientific instruments, such as detectors at synchrotron light sources, can generate data at GBs/sec rates. Data processing methods such as the widely used computational tomography usually require considerable computational resources, and yield poor quality reconstructions in the early stages of data acquisition when available views are sparse. We describe here how a deep convolutional neural network can be integrated into the real-time streaming tomography pipeline to enable better-quality images in the early stages of data acquisition. Compared with conventional streaming tomography processing, our method can significantly improve tomography image quality, deliver comparable images using only 32% of the data needed for conventional streaming processing, and save 68% experiment time for data acquisition.
CVFeb 20, 2019
TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial NetworksZhengchun Liu, Tekin Bicer, Rajkumar Kettimuthu et al.
Synchrotron-based x-ray tomography is a noninvasive imaging technique that allows for reconstructing the internal structure of materials at high spatial resolutions from tens of micrometers to a few nanometers. In order to resolve sample features at smaller length scales, however, a higher radiation dose is required. Therefore, the limitation on the achievable resolution is set primarily by noise at these length scales. We present \TOMOGAN{}, a denoising technique based on generative adversarial networks, for improving the quality of reconstructed images for low-dose imaging conditions. We evaluate our approach in two photon-budget-limited experimental conditions: (1) sufficient number of low-dose projections (based on Nyquist sampling), and (2) insufficient or limited number of high-dose projections. In both cases the angular sampling is assumed to be isotropic, and the photon budget throughout the experiment is fixed based on the maximum allowable radiation dose on the sample. Evaluation with both simulated and experimental datasets shows that our approach can significantly reduce noise in reconstructed images, improving the structural similarity score of simulation and experimental data from 0.18 to 0.9 and from 0.18 to 0.41, respectively. Furthermore, the quality of the reconstructed images with filtered back projection followed by our denoising approach exceeds that of reconstructions with the simultaneous iterative reconstruction technique, showing the computational superiority of our approach.