MSApr 9, 2018
TSFC: a structure-preserving form compilerMiklós Homolya, Lawrence Mitchell, Fabio Luporini et al.
A form compiler takes a high-level description of the weak form of partial differential equations and produces low-level code that carries out the finite element assembly. In this paper we present the Two-Stage Form Compiler (TSFC), a new form compiler with the main motivation to maintain the structure of the input expression as long as possible. This facilitates the application of optimizations at the highest possible level of abstraction. TSFC features a novel, structure-preserving method for separating the contributions of a form to the subblocks of the local tensor in discontinuous Galerkin problems. This enables us to preserve the tensor structure of expressions longer through the compilation process than other form compilers. This is also achieved in part by a two-stage approach that cleanly separates the lowering of finite element constructs to tensor algebra in the first stage, from the scheduling of those tensor operations in the second stage. TSFC also efficiently traverses complicated expressions, and experimental evaluation demonstrates good compile-time performance even for highly complex forms.
COMP-PHApr 22, 2020Code
Scaling through abstractions -- high-performance vectorial wave simulations for seismic inversion with DevitoMathias Louboutin, Fabio Luporini, Philipp Witte et al.
[Devito] is an open-source Python project based on domain-specific language and compiler technology. Driven by the requirements of rapid HPC applications development in exploration seismology, the language and compiler have evolved significantly since inception. Sophisticated boundary conditions, tensor contractions, sparse operations and features such as staggered grids and sub-domains are all supported; operators of essentially arbitrary complexity can be generated. To accommodate this flexibility whilst ensuring performance, data dependency analysis is utilized to schedule loops and detect computational-properties such as parallelism. In this article, the generation and simulation of MPI-parallel propagators (along with their adjoints) for the pseudo-acoustic wave-equation in tilted transverse isotropic media and the elastic wave-equation are presented. Simulations are carried out on industry scale synthetic models in a HPC Cloud system and reach a performance of 28TFLOP/s, hence demonstrating Devito's suitability for production-grade seismic inversion problems.
DCJul 5, 2019
Automatic Differentiation for Adjoint Stencil LoopsJan Hückelheim, Navjot Kukreja, Sri Hari Krishna Narayanan et al.
Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable. In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications.