DCMay 27, 2022
Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel ApplicationsAyesha Afzal, Georg Hager, Gerhard Wellein et al.
This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new "phase space plot," we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.
MSJul 14, 2022
FFTc: An MLIR Dialect for Developing HPC Fast Fourier Transform LibrariesYifei He, Artur Podobas, Måns I. Andersson et al.
Discrete Fourier Transform (DFT) libraries are one of the most critical software components for scientific computing. Inspired by FFTW, a widely used library for DFT HPC calculations, we apply compiler technologies for the development of HPC Fourier transform libraries. In this work, we introduce FFTc, a domain-specific language, based on Multi-Level Intermediate Representation (MLIR), for expressing Fourier Transform algorithms. We present the initial design, implementation, and preliminary results of FFTc.
92.0PLASM-PHMar 25
Multi-GPU Hybrid Particle-in-Cell Monte Carlo Simulations for Exascale Computing SystemsJeremy J. Williams, Jordy Trilaksono, Stefan Costea et al.
Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple accelerators. In this work, we present a portable, multi-GPU hybrid MPI+OpenMP implementation of BIT1 that enables scalable execution on both Nvidia and AMD accelerators through OpenMP target tasks with explicit dependencies to overlap computation and communication across devices. Portability is achieved through persistent device-resident memory, an optimized contiguous one-dimensional data layout, and a transition from unified to pinned host memory to improve large data-transfer efficiency, together with GPU Direct Memory Access (DMA) and runtime interoperability for direct device-pointer access. Standardized and scalable I/O is provided using openPMD and ADIOS2, supporting high-performance file I/O, in-memory data streaming, and in-situ analysis and visualization. Performance results on pre-exascale and exascale systems, including Frontier (OLCF-5) for up to 16,000 GPUs, demonstrate significant improvements in run time, scalability, and resource utilization for large-scale PIC MC simulations.
46.2ETMay 21
Which Superconducting Qubit Model is Good Enough? From Effective Two-Level to Circuit-Based Hamiltonians for Pulse-Level SimulationFrej Larssen, Ivy Peng, Stefano Markidis
Pulse-level simulators are the lowest-level, most widely used abstraction layer for studying how quantum hardware responds to control signals, but they can be built on Hamiltonian models with very different fidelity and cost. This raises the question: which level of physical abstraction is sufficient for a given simulation objective? We study this question for a flux-tunable two-qubit superconducting device with a fixed bus coupler by comparing three Hamiltonian descriptions of the same hardware: an effective two-level model, a three-mode Duffing model, and a circuit-based transmon model in the charge basis. Using a realistic parameter set, we evaluate these models on a common benchmark suite spanning flux-dependent spectra, extracted two-qubit interaction terms, driven single-qubit dynamics, CZ gate dynamics, leakage outside the computational subspace, and runtime. Across the tested flux range, the Duffing model follows the circuit-based reference more closely than the effective model for static spectra and reduced two-qubit quantities, while in driven benchmarks, the multilevel models reveal effects absent in the effective description. Overall, the results support a layered use of abstraction in pulse-level simulation: effective models for reduced analyses, Duffing models as a practical multilevel default, and circuit-based models for high-fidelity reference simulation or detailed leakage analysis.
18.7QUANT-PHMay 15
When Noisy Quantum Order Finding Remains Recoverable for Shor's AlgorithmQingxin Yang, Stefano Markidis
Order finding is the core subroutine of Shor's algorithm. On NISQ hardware, phase estimation output distributions are often distorted by noise, making correct order recovery difficult. We study recoverability in noisy order finding: given a measured precision-register distribution, when does standard classical post-processing still return the true order? We analyze 680 distributions from IBM quantum systems across problem instances and circuit settings. For each distribution, we apply continued-fraction post-processing with modular verification and define recoverability as whether the recovered order equals the true one. We characterize each distribution using four features: autocorrelation peak strength, normalized entropy, dominant verified mass fraction, and verified margin fraction. We evaluate these quantities using marginal feature comparisons, single-feature AUROC analysis, and multivariate tree-based classifiers. We use random-forest permutation importance to assess which quantities contribute distinct predictive information once the other features are known. To make classification behavior interpretable, we train a decision tree that exposes threshold rules for recoverable and non-recoverable distributions. We find that recoverability is strongly associated with residual comb-like structure in the measured distribution and the way verified probability mass is organized across candidate denominators. The dominant verified mass fraction is the strongest single-feature indicator of recoverability, and tree-based analysis shows it also provides the primary split in an interpretable threshold description. Some highly distorted distributions remain recoverable when one verified denominator dominates the post-processing mass, while some visibly structured distributions fail because classical post-processing favors an incorrect verified denominator.
LGFeb 2
Unsupervised Physics-Informed Operator Learning through Multi-Stage Curriculum TrainingPaolo Marcandelli, Natansh Mathur, Stefano Markidis et al.
Solving partial differential equations remains a central challenge in scientific machine learning. Neural operators offer a promising route by learning mappings between function spaces and enabling resolution-independent inference, yet they typically require supervised data. Physics-informed neural networks address this limitation through unsupervised training with physical constraints but often suffer from unstable convergence and limited generalization capability. To overcome these issues, we introduce a multi-stage physics-informed training strategy that achieves convergence by progressively enforcing boundary conditions in the loss landscape and subsequently incorporating interior residuals. At each stage the optimizer is re-initialized, acting as a continuation mechanism that restores stability and prevents gradient stagnation. We further propose the Physics-Informed Spline Fourier Neural Operator (PhIS-FNO), combining Fourier layers with Hermite spline kernels for smooth residual evaluation. Across canonical benchmarks, PhIS-FNO attains a level of accuracy comparable to that of supervised learning, using labeled information only along a narrow boundary region, establishing staged, spline-based optimization as a robust paradigm for physics-informed operator learning.
82.1CEMay 13
Robust Matrix-Free Newton-Krylov Solvers via Automatic DifferentiationMarco Pasquale, Stefano Markidis
Jacobian-Free Newton-Krylov (JFNK) methods avoid forming the full Jacobian, but still require Jacobian-vector products, i.e., Gateaux derivatives of the nonlinear residual along Krylov directions. In standard Finite Differences (FD) formulations, these products are obtained by perturbing the Newton state and differencing residuals, making the linearization sensitive to round-off error and floating-point precision. This work evaluates the global impact of forward-mode Automatic Differentiation (AD) as a replacement for FD Jacobian-vector product in finite-precision JFNK solvers. The comparison keeps the discretization, Newton iteration, line search, Krylov methods, tolerances, and CPU/GPU backend fixed, only varying linearization strategy. Benchmarks include Burgers dynamics, Su-Olson radiation diffusion, reaction-diffusion, and nonlinear time-harmonic Maxwell equations, each evaluated in different nonlinear regimes. By preventing degradation of the Krylov operator, AD accelerates computation by 2-3 orders of magnitude across both CPU and GPU architectures. More importantly, it drastically improves global solver robustness, achieving a minimum completion rate of 95%, compared to just 42% for FD. Ultimately, accurate Gateaux derivatives unify performance and accuracy in JFNK methods, making AD the optimal choice for stiff nonlinear and reduced-precision environments.
40.8ETMay 8
Post-Moore Technologies for Plasma Simulation: A Community RoadmapLuca Pennati, Erik M. Åsgrim, Jeremy J. Williams et al.
Plasma simulations are among the most computationally demanding scientific workloads, combining high-dimensional kinetic evolution, particle-mesh coupling, field solves, and data-intensive communication. As general-purpose processor scaling slows, post-Moore technologies are being explored to address bottlenecks in data movement, memory access, and power consumption. This paper provides a community perspective on the role of these technologies in plasma simulation, assessing three major classes: reconfigurable and data-path accelerators, non-von Neumann architectures, and quantum computing. Each is evaluated, in a co-design approach, against representative plasma workloads spanning particle-in-cell, continuum Vlasov, gyrokinetic, fluid/MHD, hybrid, and warm dense matter methods. We find that no single technology can replace existing HPC platforms. Instead, three tiers of opportunity emerge: FPGA-class and data-path accelerators offer near-term kernel offload and workflow-level data services, non-von Neumann architectures represent medium-term directions for operator-level acceleration, and quantum computing, although the least mature, is potentially the most disruptive for warm dense matter and inertial confinement fusion microphysics. We outline best practices for selective adoption and identify focused demonstrators, benchmarking, and modular software ecosystems as immediate community priorities.
2.6CEMay 7
Quantum Optimization for Electromagnetics: Physics-Informed QAOA for Reconfigurable Intelligent SurfacesMarco Pasquale, Erik M. Åsgrim, Stefano Markidis et al.
Optimizing Reconfigurable Intelligent Surfaces (RIS) is a high-dimensional combinatorial challenge. Current quantum algorithms often simplify this problem by ignoring physical constraints like mutual coupling, which significantly degrades real-world performance. Rather than targeting a fully realistic RIS description, we embed progressively more physics-informed models of mutual coupling into Quadratic Unconstrained Binary Optimization (QUBO) formulations. We evaluate four Ising interaction models ($J_{ij}$) for the Quantum Approximate Optimization Algorithm (QAOA), ranging from idealized phase-only to fully dense physical models. Analyzing a $5 \times 5$ grid, our results expose a critical trade-off between spatial pointing accuracy and quantum hardware feasibility. While complete global coupling maximizes beamforming precision, dense Hamiltonians introduce prohibitive routing overhead and complicate convergence on near-term processors. Ultimately, we demonstrate that while physics-informed quantum optimization is mathematically viable, sparse, distance-penalized models remain a necessary compromise for execution on current noisy intermediate-scale quantum (NISQ) devices.
12.6CEApr 21
Mass Matrix Assembly on Tensor Cores for Implicit Particle-In-Cell MethodsLuca Pennati, Stefano Markidis
Matrix-multiply-accumulate (MMA) units, or tensor cores, are now widespread across modern computing architectures. Yet, their use for particle-grid operators remains limited. In implicit particle methods, mass-matrix assembly is a reduction-dominated kernel in which weighted outer products of interpolation weights are accumulated over particle support. We show that this operation can be reformulated exactly, cell by cell, as a sequence of matrix products matched to hardware MMA tiles. The formulation is general with respect to interpolation order and hardware platform, and applies to both scalar mass matrices and the tensorial block mass matrix arising in implicit in the Energy-Conserving Semi-Implicit Method (ECSIM) for Particle-in-Cell simulations. We introduce particle batching and a support-group decomposition for higher-order shape functions whose stencil extends beyond a single cell, specialize the method to first- and second-order B-spline interpolation, and implement it on NVIDIA tensor cores. The resulting kernels achieve up to 3x over optimized conventional implementations and reduce end-to-end ECSIM runtime by 15%.
54.5CEApr 10
BVH-Accelerated Ray Tracing for High-Frequency Electromagnetic BackscatteringMarco Pasquale, Andong Hu, Luca Pennati et al.
As computational complexity in electromagnetics increases with frequency, full-wave solvers become computationally infeasible for electrically large problems. To address this limitation, we present a shooting and bouncing rays (SBR) method for efficiently modeling electromagnetic backscattering of metallic objects in the high-frequency regime. The method couples multi-reflection geometrical-optics ray transport with a physical optics surface integral discretized over ray tubes. To reduce the massive ray-surface intersection search space, we use a bounding volume hierarchy (BVH) and organize the computation as a trace-integrate pipeline. The ray tracing generates hit data, and the physical optics integral is evaluated over valid intersections only. Numerical accuracy is controlled through an incident-ray sampling rule that mitigates phase aliasing in the discretized physical optics integration. The method is accelerated on NVIDIA and AMD GPUs and parallelized with MPI. We validate against analytical Mie solutions for a perfectly electrically conducting (PEC) sphere and demonstrate applicability to a complex aircraft geometry for monostatic radar cross-section prediction.
25.4DCApr 8
Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACSLuca Pennati, Andong Hu, Ivy Peng et al.
GROMACS is a de-facto standard for classical Molecular Dynamics (MD). The rise of AI-driven interatomic potentials that pursue near-quantum accuracy at MD throughput now poses a significant challenge: embedding neural-network inference into multi-GPU simulations retaining high-performance. In this work, we integrate the MLIP framework DeePMD-kit into GROMACS, enabling domain-decomposed, GPU-accelerated inference across multi-node systems. We extend the GROMACS NNPot interface with a DeePMD backend, and we introduce a domain decomposition layer decoupled from the main simulation. The inference is executed concurrently on all processes, with two MPI collectives used each step to broadcast coordinates and to aggregate and redistribute forces. We train an in-house DPA-1 model (1.6 M parameters) on a dataset of solvated protein fragments. We validate the implementation on a small protein system, then we benchmark the GROMACS-DeePMD integration with a 15,668 atom protein on NVIDIA A100 and AMD MI250x GPUs up to 32 devices. Strong-scaling efficiency reaches 66% at 16 devices and 40% at 32; weak-scaling efficiency is 80% to 16 devices and reaches 48% (MI250x) and 40% (A100) at 32 devices. Profiling with the ROCm System profiler shows that >90% of the wall time is spent in DeePMD inference, while MPI collectives contribute <10%, primarily since they act as a global synchronization point. The principal bottlenecks are the irreducible ghost-atom cost set by the cutoff radius, confirmed by a simple throughput model, and load imbalance across ranks. These results demonstrate that production MD with near ab initio fidelity is feasible at scale in GROMACS.
LGMay 7, 2024
Decoding complexity: how machine learning is redefining scientific discoveryRicardo Vinuesa, Paola Cinnella, Jean Rabault et al. · uw
As modern scientific instruments generate vast amounts of data and the volume of information in the scientific literature continues to grow, machine learning (ML) has become an essential tool for organising, analysing, and interpreting these complex datasets. This paper explores the transformative role of ML in accelerating breakthroughs across a range of scientific disciplines. By presenting key examples -- such as brain mapping and exoplanet detection -- we demonstrate how ML is reshaping scientific research. We also explore different scenarios where different levels of knowledge of the underlying phenomenon are available, identifying strategies to overcome limitations and unlock the full potential of ML. Despite its advances, the growing reliance on ML poses challenges for research applications and rigorous validation of discoveries. We argue that even with these challenges, ML is poised to disrupt traditional methodologies and advance the boundaries of knowledge by enabling researchers to tackle increasingly complex problems. Thus, the scientific community can move beyond the necessary traditional oversimplifications to embrace the full complexity of natural systems, ultimately paving the way for interdisciplinary breakthroughs and innovative solutions to humanity's most pressing challenges.
LGJul 11, 2025
Partitioned Hybrid Quantum Fourier Neural Operators for Scientific Quantum Machine LearningPaolo Marcandelli, Yuanchun He, Stefano Mariani et al.
We introduce the Partitioned Hybrid Quantum Fourier Neural Operator (PHQFNO), a generalization of the Quantum Fourier Neural Operator (QFNO) for scientific machine learning. PHQFNO partitions the Fourier operator computation across classical and quantum resources, enabling tunable quantum-classical hybridization and distributed execution across quantum and classical devices. The method extends QFNOs to higher dimensions and incorporates a message-passing framework to distribute data across different partitions. Input data are encoded into quantum states using unary encoding, and quantum circuit parameters are optimized using a variational scheme. We implement PHQFNO using PennyLane with PyTorch integration and evaluate it on Burgers' equation, incompressible and compressible Navier-Stokes equations. We show that PHQFNO recovers classical FNO accuracy. On incompressible Navier-Stokes, PHQFNO achieves higher accuracy than its classical counterparts. Finally, we perform a sensitivity analysis under input noise, confirming improved stability of PHQFNO over classical baselines.
CEApr 25, 2025
Discovering Governing Equations of Geomagnetic Storm Dynamics with Symbolic RegressionStefano Markidis, Jonah Ekelund, Luca Pennati et al.
Geomagnetic storms are large-scale disturbances of the Earth's magnetosphere driven by solar wind interactions, posing significant risks to space-based and ground-based infrastructure. The Disturbance Storm Time (Dst) index quantifies geomagnetic storm intensity by measuring global magnetic field variations. This study applies symbolic regression to derive data-driven equations describing the temporal evolution of the Dst index. We use historical data from the NASA OMNIweb database, including solar wind density, bulk velocity, convective electric field, dynamic pressure, and magnetic pressure. The PySR framework, an evolutionary algorithm-based symbolic regression library, is used to identify mathematical expressions linking dDst/dt to key solar wind. The resulting models include a hierarchy of complexity levels and enable a comparison with well-established empirical models such as the Burton-McPherron-Russell and O'Brien-McPherron models. The best-performing symbolic regression models demonstrate superior accuracy in most cases, particularly during moderate geomagnetic storms, while maintaining physical interpretability. Performance evaluation on historical storm events includes the 2003 Halloween Storm, the 2015 St. Patrick's Day Storm, and a 2017 moderate storm. The results provide interpretable, closed-form expressions that capture nonlinear dependencies and thresholding effects in Dst evolution.
LGApr 22, 2025
Adaptive PCA-Based Outlier Detection for Multi-Feature Time Series in Space MissionsJonah Ekelund, Savvas Raptis, Vicki Toy-Edens et al.
Analyzing multi-featured time series data is critical for space missions making efficient event detection, potentially onboard, essential for automatic analysis. However, limited onboard computational resources and data downlink constraints necessitate robust methods for identifying regions of interest in real time. This work presents an adaptive outlier detection algorithm based on the reconstruction error of Principal Component Analysis (PCA) for feature reduction, designed explicitly for space mission applications. The algorithm adapts dynamically to evolving data distributions by using Incremental PCA, enabling deployment without a predefined model for all possible conditions. A pre-scaling process normalizes each feature's magnitude while preserving relative variance within feature types. We demonstrate the algorithm's effectiveness in detecting space plasma events, such as distinct space environments, dayside and nightside transients phenomena, and transition layers through NASA's MMS mission observations. Additionally, we apply the method to NASA's THEMIS data, successfully identifying a dayside transient using onboard-available measurements.
CLApr 12, 2025
Optimizing FDTD Solvers for Electromagnetics: A Compiler-Guided Approach with High-Level Tensor AbstractionsYifei He, Måns I. Andersson, Stefano Markidis
The Finite Difference Time Domain (FDTD) method is a widely used numerical technique for solving Maxwell's equations, particularly in computational electromagnetics and photonics. It enables accurate modeling of wave propagation in complex media and structures but comes with significant computational challenges. Traditional FDTD implementations rely on handwritten, platform-specific code that optimizes certain kernels while underperforming in others. The lack of portability increases development overhead and creates performance bottlenecks, limiting scalability across modern hardware architectures. To address these challenges, we introduce an end-to-end domain-specific compiler based on the MLIR/LLVM infrastructure for FDTD simulations. Our approach generates efficient and portable code optimized for diverse hardware platforms.We implement the three-dimensional FDTD kernel as operations on a 3D tensor abstraction with explicit computational semantics. High-level optimizations such as loop tiling, fusion, and vectorization are automatically applied by the compiler. We evaluate our customized code generation pipeline on Intel, AMD, and ARM platforms, achieving up to $10\times$ speedup over baseline Python implementation using NumPy.
LGApr 4, 2025
Discovering Partially Known Ordinary Differential Equations: a Case Study on the Chemical Kinetics of Cellulose DegradationFederica Bragone, Kateryna Morozovska, Tor Laneryd et al.
The degree of polymerization (DP) is one of the methods for estimating the aging of the polymer based insulation systems, such as cellulose insulation in power components. The main degradation mechanisms in polymers are hydrolysis, pyrolysis, and oxidation. These mechanisms combined cause a reduction of the DP. However, the data availability for these types of problems is usually scarce. This study analyzes insulation aging using cellulose degradation data from power transformers. The aging problem for the cellulose immersed in mineral oil inside power transformers is modeled with ordinary differential equations (ODEs). We recover the governing equations of the degradation system using Physics-Informed Neural Networks (PINNs) and symbolic regression. We apply PINNs to discover the Arrhenius equation's unknown parameters in the Ekenstam ODE describing cellulose contamination content and the material aging process related to temperature for synthetic data and real DP values. A modification of the Ekenstam ODE is given by Emsley's system of ODEs, where the rate constant expressed by the Arrhenius equation decreases in time with the new formulation. We use PINNs and symbolic regression to recover the functional form of one of the ODEs of the system and to identify an unknown parameter.
AIJun 20, 2024
AI in Space for Scientific Missions: Strategies for Minimizing Neural-Network Model UploadJonah Ekelund, Ricardo Vinuesa, Yuri Khotyaintsev et al.
Artificial Intelligence (AI) has the potential to revolutionize space exploration by delegating several spacecraft decisions to an onboard AI instead of relying on ground control and predefined procedures. It is likely that there will be an AI/ML Processing Unit onboard the spacecraft running an inference engine. The neural-network will have pre-installed parameters that can be updated onboard by uploading, by telecommands, parameters obtained by training on the ground. However, satellite uplinks have limited bandwidth and transmissions can be costly. Furthermore, a mission operating with a suboptimal neural network will miss out on valuable scientific data. Smaller networks can thereby decrease the uplink cost, while increasing the value of the scientific data that is downloaded. In this work, we evaluate and discuss the use of reduced-precision and bare-minimum neural networks to reduce the time for upload. As an example of an AI use case, we focus on the NASA's Magnetosperic MultiScale (MMS) mission. We show how an AI onboard could be used in the Earth's magnetosphere to classify data to selectively downlink higher value data or to recognize a region-of-interest to trigger a burst-mode, collecting data at a high-rate. Using a simple filtering scheme and algorithm, we show how the start and end of a region-of-interest can be detected in on a stream of classifications. To provide the classifications, we use an established Convolutional Neural Network (CNN) trained to an accuracy >94%. We also show how the network can be reduced to a single linear layer and trained to the same accuracy as the established CNN. Thereby, reducing the overall size of the model by up to 98.9%. We further show how each network can be reduced by up to 75% of its original size, by using lower-precision formats to represent the network parameters, with a change in accuracy of less than 0.6 percentage points.
LGJul 14, 2021
Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrainMartin Svedin, Artur Podobas, Steven W. D. Chien et al.
One of the most promising approaches for data analysis and exploration of large data sets is Machine Learning techniques that are inspired by brain models. Such methods use alternative learning rules potentially more efficiently than established learning rules. In this work, we focus on the potential of brain-inspired ML for exploiting High-Performance Computing (HPC) resources to solve ML problems: we discuss the BCPNN and an HPC implementation, called StreamBrain, its computational cost, suitability to HPC systems. As an example, we use StreamBrain to analyze the Higgs Boson dataset from High Energy Physics and discriminate between background and signal classes in collisions of high-energy particle colliders. Overall, we reach up to 69.15% accuracy and 76.4% Area Under the Curve (AUC) performance.
PLASM-PHJul 5, 2021
A Deep Learning-Based Particle-in-Cell Method for Plasma SimulationsXavier Aguilar, Stefano Markidis
We design and develop a new Particle-in-Cell (PIC) method for plasma simulations using Deep-Learning (DL) to calculate the electric field from the electron phase space. We train a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN) to solve the two-stream instability test. We verify that the DL-based MLP PIC method produces the correct results using the two-stream instability: the DL-based PIC provides the expected growth rate of the two-stream instability. The DL-based PIC does not conserve the total energy and momentum. However, the DL-based PIC method is stable against the cold-beam instability, affecting traditional PIC methods. This work shows that integrating DL technologies into traditional computational methods is a viable approach for developing next-generation PIC algorithms.
DCJun 9, 2021
StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAsArtur Podobas, Martin Svedin, Steven W. D. Chien et al.
The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In this paper, we introduce StreamBrain -- a framework that allows neural networks based on BCPNN to be practically deployed in High-Performance Computing systems. StreamBrain is a domain-specific language (DSL), similar in concept to existing machine learning (ML) frameworks, and supports backends for CPUs, GPUs, and even FPGAs. We empirically demonstrate that StreamBrain can train the well-known ML benchmark dataset MNIST within seconds, and we are the first to demonstrate BCPNN on STL-10 size networks. We also show how StreamBrain can be used to train with custom floating-point formats and illustrate the impact of using different bfloat variations on BCPNN using FPGAs.
COMP-PHOct 11, 2020
Automatic Particle Trajectory Classification in Plasma SimulationsStefano Markidis, Ivy Peng, Artur Podobas et al.
Numerical simulations of plasma flows are crucial for advancing our understanding of microscopic processes that drive the global plasma dynamics in fusion devices, space, and astrophysical systems. Identifying and classifying particle trajectories allows us to determine specific on-going acceleration mechanisms, shedding light on essential plasma processes. Our overall goal is to provide a general workflow for exploring particle trajectory space and automatically classifying particle trajectories from plasma simulations in an unsupervised manner. We combine pre-processing techniques, such as Fast Fourier Transform (FFT), with Machine Learning methods, such as Principal Component Analysis (PCA), k-means clustering algorithms, and silhouette analysis. We demonstrate our workflow by classifying electron trajectories during magnetic reconnection problem. Our method successfully recovers existing results from previous literature without a priori knowledge of the underlying system. Our workflow can be applied to analyzing particle trajectories in different phenomena, from magnetic reconnection, shocks to magnetospheric flows. The workflow has no dependence on any physics model and can identify particle trajectories and acceleration mechanisms that were not detected before.
SPACE-PHAug 15, 2019
Automated classification of plasma regions using 3D particle energy distributionsVyacheslav Olshevsky, Yuri V. Khotyaintsev, Ahmad Lalti et al.
We investigate the properties of the ion sky maps produced by the Dual Ion Spectrometers (DIS) from the Fast Plasma Investigation (FPI). We have trained a convolutional neural network classifier to predict four regions crossed by the MMS on the dayside magnetosphere: solar wind, ion foreshock, magnetosheath, and magnetopause using solely DIS spectrograms. The accuracy of the classifier is >98%. We use the classifier to detect mixed plasma regions, in particular to find the bow shock regions. A similar approach can be used to identify the magnetopause crossings and reveal regions prone to magnetic reconnection. Data processing through the trained classifier is fast and efficient and thus can be used for classification for the whole MMS database.