NANADec 19, 2018

Automatic Code Generation for High-Performance Discontinuous Galerkin Methods on Modern Architectures

arXiv:1812.0807531 citationsh-index: 31Has Code
Originality Incremental advance
AI Analysis

This work provides a performance-portable code generation solution for Discontinuous Galerkin methods, addressing the challenge of SIMD vectorization for domain scientists using the Dune framework.

The authors tackle the sustainability and performance portability issue of SIMD vectorization in high-performance computing by enriching the dune-pdelab framework with a code generation approach that combines UFL and loopy. They achieve 40% to 60% of theoretical peak performance for matrix-free operator application on AVX2 and AVX512 architectures.

SIMD vectorization has lately become a key challenge in high-performance computing. However, hand-written explicitly vectorized code often poses a threat to the software's sustainability. In this publication we solve this sustainability and performance portability issue by enriching the simulation framework dune-pdelab with a code generation approach. The approach is based on the well-known domain-specific language UFL, but combines it with loopy, a more powerful intermediate representation for the computational kernel. Given this flexible tool, we present and implement a new class of vectorization strategies for the assembly of Discontinuous Galerkin methods on hexahedral meshes exploiting the finite element's tensor product structure. The optimal variant from this class is chosen by the code generator through an autotuning approach. The implementation is done within the open source PDE software framework Dune and the discretization module dune-pdelab. The strength of the proposed approach is illustrated with performance measurements for DG schemes for a scalar diffusion reaction equation and the Stokes equation. In our measurements, we utilize both the AVX2 and the AVX512 instruction set, achieving 40\% to 60\% of the machine's theoretical peak performance for one matrix-free application of the operator.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes