Stefan Henneking

DC
4papers
4citations
Novelty56%
AI Score50

4 Papers

79.9DCMar 10Code
Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores

Jiqun Tu, Ian Karlin, John Camier et al.

Finite element simulations play a critical role in a wide range of applications, from automotive design to tsunami modeling and computational electromagnetics. Performing these simulations efficiently at the high resolutions needed for practical applications and scientific insights necessitates the use of high-order methods and large-scale supercomputing. While much progress has been made in porting finite element codes to GPU systems in recent years, additional improvements in the efficiency and computational speed of GPU-accelerated high-order finite element simulations are in constant demand. In this paper, we demonstrate that the FP64 tensor cores on NVIDIA GPUs can be used to further accelerate such simulations, achieving significant speedups in key kernels of MFEM, a scalable open-source finite element library widely used in HPC applications. By integrating FP64 tensor cores with kernel fusion optimizations, we were able to achieve up to 2$\times$ performance gains and up to 83% energy efficiency gains on NVIDIA's Grace Hopper GH200 and Grace Blackwell GB200 architectures. To the best of our knowledge, this is the first time that FP64 tensor cores have been directly programmed to accelerate large-scale finite element scientific computing applications. We demonstrate the performance of the optimized kernels at exascale by showing near-perfect weak scaling efficiency and 90% strong scaling efficiency across nearly 10,000 GPUs on the Alps system. The new algorithms and MFEM enhancements directly benefit complex production codes, including the 2025 Gordon Bell Prize-winning application for real-time tsunami forecasting.

65.6GEO-PHMar 16
Real-time probabilistic tsunami forecasting in Cascadia from sparse offshore pressure observations

Stefan Henneking, Fabian Kutschera, Sreeram Venkat et al.

Near-field tsunami early warning in the Cascadia Subduction Zone is limited by sparse offshore observations. We show that a hypothetical network of 175 seafloor pressure sensors can support real-time Bayesian inference of tsunamigenic seafloor motion and probabilistic tsunami forecasts for two fully-coupled Cascadia earthquake dynamic rupture--tsunami scenarios, a partial rupture and a margin-wide rupture. The complex oceanic acoustic, Rayleigh, and tsunami wavefields in both scenarios are similar during the first two minutes and then diverge. Using an acoustic--gravity inversion with offline precomputation and online assimilation of pressure data, tsunami forecasts are obtained in less than a second. We leverage a Bayesian inversion-based framework that splits the computations into an offline precomputation phase performed with large-scale computing facilities, and an online phase that computes forecasts from real-time data and can be executed on a laptop. Forecast errors remain low at 22.1% for the margin-wide rupture and 19.6% for the partial rupture.

61.0DCApr 9
Sensor Placement for Tsunami Early Warning via Large-Scale Bayesian Optimal Experimental Design

Sreeram Venkat, Stefan Henneking, Omar Ghattas

Real-time tsunami early warning relies on distributed sensor networks to infer seismic sources and seafloor motion. Optimizing these networks via Bayesian optimal experimental design (OED) is exceptionally challenging for systems governed by hyperbolic partial differential equations, which lack the spectral decay required by standard low-rank approximations. We present a scalable Bayesian OED framework for linear time-invariant systems. By reformulating the inverse problem in the data space, we transform OED into dense matrix subset selection. We propose a multi-GPU, Schur-complement-update-based, greedy algorithm that solves the OED problem using a pipelined approach that fully overlaps I/O with GPU computations. Our framework achieves near-perfect weak and strong scaling across hundreds of GPUs on Perlmutter and Frontier. Applied to the 2025 Gordon Bell Prize-winning digital twin for tsunami forecasting in the Cascadia Subduction Zone, we optimize a 175-sensor network, minimizing the uncertainty of a parameter field with over one billion degrees of freedom.

7.0OPTICSMar 31
Bent optical waveguide finite element analysis with a 3D envelope Maxwell model

Jaime Mora-Paz, Stefan Henneking, Leszek Demkowicz et al.

With the goal of accurately extracting the optical field losses in a three-dimensional (3D), circularly coiled waveguide (e.g., bent optical fiber), this effort presents the numerical methodologies that are implemented for an envelope Maxwell model that propagates electromagnetic fields as an entirely boundary value problem. Our unique modeling approach includes an ultraweak variational formulation of the envelope Maxwell model in the curved geometry of the bending, which is discretized by the discontinuous Petrov-Galerkin (DPG) method, which permits residual-driven mesh and polynomial-order adaptivity. This also, then, requires a unique approach for constructing perfectly matched layers (PMLs) as absorbing boundary conditions in both the direction of optical field propagation and in the tangential directions, where unguided energy escapes the waveguide. Our coiled waveguide modeling technology extracts the mode confinement losses from the propagation of the coherent optical field through the bent waveguide. We verify our simulations against the semi-analytical results from the analogous bent slab waveguide problem, and we successfully demonstrate stable convergence to loss values for the 3D coiled optical fiber problem, which has never been done previously for our specific modeling approach.