NANov 24, 2018
Unified geometric multigrid algorithm for hybridized high-order finite element methodsTim Wildey, Sriramkrishnan Muralikrishnan, Tan Bui-Thanh
We consider a standard elliptic partial differential equation and propose a geometric multigrid algorithm based on Dirichlet-to-Neumann (DtN) maps for hybridized high-order finite element methods. The proposed unified approach is applicable to any locally conservative hybridized finite element method including multinumerics with different hybridized methods in different parts of the domain. For these methods, the linear system involves only the unknowns residing on the mesh skeleton, and constructing intergrid transfer operators is therefore not trivial. The key to our geometric multigrid algorithm is the physics-based energy-preserving intergrid transfer operators which depend only on the fine scale DtN maps. Thanks to these operators, we completely avoid upscaling of parameters and no information regarding subgrid physics is explicitly required on coarse meshes. Moreover, our algorithm is agglomeration-based and can straightforwardly handle unstructured meshes. We perform extensive numerical studies with hybridized mixed methods, hybridized discontinuous Galerkin method, weak Galerkin method, and a hybridized version of interior penalty discontinuous Galerkin methods on a range of elliptic problems including subsurface flow through highly heterogeneous porous media. We compare the performance of different smoothers and analyze the effect of stabilization parameters on the scalability of the multigrid algorithm.
NANov 14, 2017
An Improved Iterative HDG Approach for Partial Differential EquationsSriramkrishnan Muralikrishnan, Minh-Binh Tran, Tan Bui-Thanh
We propose and analyze an iterative high-order hybridized discontinuous Galerkin (iHDG) discretization for linear partial differential equations. We improve our previous work (SIAM J. Sci. Comput. Vol. 39, No. 5, pp. S782--S808) in several directions: 1) the improved iHDG approach converges in a finite number of iterations for the scalar transport equation; 2) it is unconditionally convergent for both the linearized shallow water system and the convection-diffusion equation; 3) it has improved stability and convergence rates; 4) we uncover a relationship between the number of iterations and time stepsize, solution order, meshsize and the equation parameters. This allows us to choose the time stepsize such that the number of iterations is approximately independent of the solution order and the meshsize; and 5) we provide both strong and weak scalings of the improved iHDG approach up to $16,384$ cores. A connection between iHDG and time integration methods such as parareal and implicit/explicit methods are discussed. Extensive numerical results are presented to verify the theoretical findings.
26.8CEMay 11Code
On Distributed Parallelization Strategies for Particle-in-Fourier SchemesSriramkrishnan Muralikrishnan, Paul Fischill, Andreas Adelmann et al.
We present and compare distributed parallelization strategies for the particle-in-Fourier (PIF) schemes used in kinetic plasma simulations. The different strategies are i) domain decomposition, where both the particles and Fourier modes are split between the MPI ranks ii) particle decomposition, where only the particles are split between the ranks and each rank carries all the modes, and, iii) space-time decomposition, in which time parallelization based on the parareal algorithm is added on top of the particle decomposition. We describe the different communication patterns involved in each of the strategies, the parameter regimes where they work best, and explain their advantages and disadvantages. We implement the strategies within the open-source, performance portable library IPPL and conduct scaling studies with 3D-3V Landau damping and Penning trap benchmark problems on Alps and JUWELS booster supercomputers. We analyze the dominant component timings in each of the strategies and identify areas for future optimizations.
35.6CEMay 6
A Comparison of Massively Parallel Performance Portable Particle-in-Cell schemes for electrostatic kinetic plasma simulationsSonali Mayani, Paul Fischill, Sriramkrishnan Muralikrishnan et al.
We compare different Poisson solvers within the context of an electrostatic Vlasov-Poisson system. These schemes are implemented as part of the IPPL (Independent Parallel Particle Layer) library (Frey et al., 2024), which provides performance portable and dimension independent building blocks for scientific simulations requiring particle-mesh methods, with Eulerian (mesh-based) and Lagrangian (particle-based) approaches. The simulation used to compare the performance and portability of the schemes is Landau damping, part of a set of mini-applications implemented to benchmark and showcase the capabilities of the IPPL library (Muralikrishnan et al., 2024). We use grid-sizes of $512^3$ and $1024^3$ with 8 particles per cell, running with different algorithms in the solve phase of the Particle-in-Cell (PIC) loop: a Fast Fourier Transform (FFT) pseudo-spectral solver, a matrix-free finite difference Preconditioned Conjugate Gradient (PCG) solver, and a matrix-free Finite Element (FEM) solver. We also compare these PIC schemes to the novel Particle-in-Fourier (PIF) scheme, which performs interpolations using non-uniform FFTs thereby avoiding a grid in the real space. We obtain results on different computing architectures, such as AMD GPUs (LUMI at CSC), and Nvidia GPUs (Alps at CSCS and JUWELS Booster at Jülich Supercomputing Center), showcasing portability. In terms of absolute time the FFT solver is advantageous, but is limited in its applicability. All other field solvers in the PIC scheme are an order-of-magnitude more expensive in terms of time, but scale similarly to the FFT case in the electrostatic PIC context. The PIF scheme serves as a high fidelity alternative to standard PIC, and while it is costlier than the FFT-based PIC scheme, it shows excellent scalability on all the architectures.
27.6CEMay 11
A Performance-Portable, Massively Parallel Distributed Nonuniform FFTPaul Fischill, Andreas Adelmann, Sriramkrishnan Muralikrishnan
The nonuniform fast Fourier transform (NUFFT) enables spectral methods for problems with irregularly spaced samples, with applications in medical imaging, molecular dynamics, and kinetic plasma simulations. Existing implementations are limited to shared-memory execution, restricting problem sizes to what fits on a single node. We present the first distributed, performance-portable NUFFT for heterogeneous supercomputers. Our Kokkos-based implementation runs without modification on NVIDIA and AMD GPUs. We develop multiple spreading and interpolation kernels optimized for different accuracy requirements and architectures. Our spreading kernels match or exceed the single-GPU throughput of the state-of-the-art CUDA-based NUFFT library cuFINUFFT at production particle densities, while our Kokkos-based implementation additionally supports AMD GPUs. Strong scaling experiments on Alps (NVIDIA GH200), JUWELS Booster (NVIDIA A100), and LUMI (AMD MI250X) demonstrate scaling up to 1024 GPUs. At scale, the distributed FFT is a significant part of the total runtime, making higher NUFFT accuracy less expensive. We apply the method to massively parallel Particle-in-Fourier simulations of Landau damping with up to $1024^3$ Fourier modes and 8.6 billion particles on Alps, JUWELS, and LUMI, demonstrating that distributed NUFFTs enable kinetic plasma simulations at resolutions previously inaccessible to spectral particle methods.
NAJun 1, 2017
iHDG: An Iterative HDG Framework for Partial Differential EquationsSriramkrishnan Muralikrishnan, Minh-Binh Tran, Tan Bui-Thanh
We present a scalable iterative solver for high-order hybridized discontinuous Galerkin (HDG) discretizations of linear partial differential equations. It is an interplay between domain decomposition methods and HDG discretizations, and hence inheriting advances from both sides. In particular, the method can be viewed as a Gauss-Seidel approach that requires only independent element-by-element and face-by-face local solves in each iteration. As such, it is well-suited for current and future computing systems with massive concurrencies. Unlike conventional Gauss-Seidel schemes which are purely algebraic, the convergence of iHDG, thanks to the built-in HDG numerical flux, does not depend on the ordering of unknowns. We rigorously show the convergence of the proposed method for the transport equation, the linearized shallow water equation and the convection-diffusion equation. For the transport equation, the method is convergent regardless of mesh size $h$ and solution order $p$, and furthermore the convergence rate is independent of the solution order. For the linearized shallow water and the convection-diffusion equations we show that the convergence is conditional on both $h$ and $p$. Extensive steady and time-dependent numerical results for the 2D and 3D transport equations, the linearized shallow water equation, and the convection-diffusion equation are presented to verify the theoretical findings.