Architecture-aware $h$-to-$p$ optimisation: spectral/$hp$ element operators for mixed-element meshes
This work addresses performance bottlenecks in computational fluid dynamics and related fields by enabling efficient finite element simulations on complex geometries, though it is incremental as it builds on prior optimization efforts.
The paper tackled the challenge of optimizing spectral element methods on mixed-element meshes for GPUs and CPUs, showing that tailored implementation strategies for different element shapes and polynomial orders can achieve high performance, with GPU tests indicating the Helmholtz operator on tetrahedral elements is only 2.5 times slower than on hexahedral elements despite higher computational costs.
We extend earlier international efforts to optimise hexahedral-based spectral element methods on GPUs and vectorised CPUs to mixed element meshes additionally involving prismatic, pyramidic, and tetrahedral shapes using tensorial expansions. We demonstrate that common finite element operators (such as the mass and Helmholtz matrices) benefit from alternative implementation strategies depending on the element shape, choice of polynomial order, and system architecture in order to achieve optimal performance. In addition, we introduce a new approach/interpretation to efficiently evaluate more complex operations involving inner products with the derivative of the expansions as part of the integrand such as the stiffness matrix. This approach seeks to maximise operations using the collocation properties of the nodal tensorial expansion associated with classical quadrature rules. Our GPU performance tests demonstrate that the throughput of the Helmholtz operator on tetrahedral elements is at most 2.5 times slower than on hexahedral elements, despite tetrahedra having a factor of six greater floating-point operations.