NAJul 25, 2018Code
The Target-Matrix Optimization Paradigm for High-Order MeshesVeselin Dobrev, Patrick Knupp, Tzanio Kolev et al.
We describe a framework for controlling and improving the quality of high-order finite element meshes based on extensions of the Target-Matrix Optimization Paradigm (TMOP) of Knupp. This approach allows high-order applications to have a very precise control over local mesh quality, while still improving the mesh globally. We address the adaption of various TMOP components to the settings of general isoparametric element mappings, including the mesh quality metric in 2D and 3D, the selection of sample points and the solution of the resulting mesh optimization problem. We also investigate additional practical concerns, such as tangential relaxation and restricting the deviation from the original mesh. The benefits of the new high-order TMOP algorithms are illustrated on a number of test problems and examples from a high-order arbitrary Eulerian-Lagrangian (ALE) application. Our implementation is freely available in an open-source library form.
80.0DCMar 10Code
Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor CoresJiqun Tu, Ian Karlin, John Camier et al.
Finite element simulations play a critical role in a wide range of applications, from automotive design to tsunami modeling and computational electromagnetics. Performing these simulations efficiently at the high resolutions needed for practical applications and scientific insights necessitates the use of high-order methods and large-scale supercomputing. While much progress has been made in porting finite element codes to GPU systems in recent years, additional improvements in the efficiency and computational speed of GPU-accelerated high-order finite element simulations are in constant demand. In this paper, we demonstrate that the FP64 tensor cores on NVIDIA GPUs can be used to further accelerate such simulations, achieving significant speedups in key kernels of MFEM, a scalable open-source finite element library widely used in HPC applications. By integrating FP64 tensor cores with kernel fusion optimizations, we were able to achieve up to 2$\times$ performance gains and up to 83% energy efficiency gains on NVIDIA's Grace Hopper GH200 and Grace Blackwell GB200 architectures. To the best of our knowledge, this is the first time that FP64 tensor cores have been directly programmed to accelerate large-scale finite element scientific computing applications. We demonstrate the performance of the optimized kernels at exascale by showing near-perfect weak scaling efficiency and 90% strong scaling efficiency across nearly 10,000 GPUs on the Alps system. The new algorithms and MFEM enhancements directly benefit complex production codes, including the 2025 Gordon Bell Prize-winning application for real-time tsunami forecasting.
NADec 2, 2016
A scalable preconditioner for a DPG methodAndrew T. Barker, Veselin Dobrev, Jay Gopalakrishnan et al.
We show how a scalable preconditioner for the primal discontinuous Petrov-Galerkin (DPG) method can be developed using existing algebraic multigrid (AMG) preconditioning techniques. The stability of the DPG method gives a norm equivalence which allows us to exploit existing AMG algorithms and software. We show how these algebraic preconditioners can be applied directly to a Schur complement system of interface unknowns arising from the DPG method. To the best of our knowledge, this is the first massively scalable algebraic preconditioner for DPG problems.
91.4NAApr 17
Algebraic Multigrid with Filtering: An Efficient Preconditioner for Interior Point Methods in Large-Scale Contact Mechanics OptimizationSocratis Petrides, Tucker Hartland, Tzanio Kolev et al.
Large-scale contact mechanics simulations are crucial in many engineering fields such as structural design and manufacturing. In the frictionless case, contact can be modeled by minimizing an energy functional; however, these problems are often nonlinear, nonconvex, and increasingly difficult to solve as mesh resolution increases. In this work, we employ a Newton-based interior-point (IP) filter line-search method, an effective approach for large-scale constrained optimization. While this method converges rapidly, each iteration requires solving a large saddle-point linear system that becomes ill-conditioned as the optimization process converges, largely due to IP treatment of the contact constraints. Such ill-conditioning can hinder solver scalability and increase iteration counts with mesh refinement. To address this, we introduce a novel preconditioner, AMG with filtering (AMGF), tailored to the Schur complement of the saddle-point system. Building on the classical AMG solver, commonly used for elasticity, we augment it with a specialized subspace correction that filters near null space components introduced by contact interface constraints. Through theoretical analysis and numerical experiments on a range of linear and nonlinear contact problems, we demonstrate that AMGF achieves mesh independent convergence and maintains robustness against the ill-conditioning that notoriously plagues IP methods. These results indicate that AMGF makes contact mechanics simulations more tractable and broadens the applicability of Newton-based IP methods in challenging engineering scenarios. More broadly, AMGF is well suited for problems where solver performance is limited by a low-dimensional subspace, such as those arising from localized constraints, interface conditions or model heterogeneities, making it applicable beyond contact mechanics and constrained optimization.
LGMar 1, 2021
Reinforcement Learning for Adaptive Mesh RefinementJiachen Yang, Tarik Dzanic, Brenden Petersen et al.
Large-scale finite element simulations of complex physical systems governed by partial differential equations (PDE) crucially depend on adaptive mesh refinement (AMR) to allocate computational budget to regions where higher resolution is required. Existing scalable AMR methods make heuristic refinement decisions based on instantaneous error estimation and thus do not aim for long-term optimality over an entire simulation. We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning (RL) to train refinement policies directly from simulation. AMR poses a new problem for RL as both the state dimension and available action set changes at every step, which we solve by proposing new policy architectures with differing generality and inductive bias. The model sizes of these policy architectures are independent of the mesh size and hence can be deployed on larger simulations than those used at train time. We demonstrate in comprehensive experiments on static function estimation and time-dependent equations that RL policies can be trained on problems without using ground truth solutions, are competitive with a widely-used error estimator, and generalize to larger, more complex, and unseen test problems.
NAMay 10, 2019
Non-Conforming Mesh Refinement for High-Order Finite ElementsJakub Červený, Veselin Dobrev, Tzanio Kolev
We propose a general algorithm for non-conforming adaptive mesh refinement (AMR) of unstructured meshes in high-order finite element codes. Our focus is on h-refinement with a fixed polynomial order. The algorithm handles triangular, quadrilateral, hexahedral and prismatic meshes of arbitrarily high order curvature, for any order finite element space in the de Rham sequence. We present a flexible data structure for meshes with hanging nodes and a general procedure to construct the conforming interpolation operator, both in serial and in parallel. The algorithm and data structure allow anisotropic refinement of tensor product elements in 2D and 3D, and support unlimited refinement ratios of adjacent elements. We report numerical experiments verifying the correctness of the algorithms, and perform a parallel scaling study to show that we can adapt meshes containing billions of elements and run efficiently on 393,000 parallel tasks. Finally, we illustrate the integration of dynamic AMR into a high-order Lagrangian hydrodynamics solver.