Tobias Weinzierl

h-index15

4papers

867citations

Novelty52%

AI Score24

Ranked #168,843 of 194,257 authors (top 87%)#9 in MS (top 50%)

4 Papers

1.2NAAug 11, 2018

Efficient implementation of ADER discontinuous Galerkin schemes for a scalable hyperbolic PDE engine

Michael Dumbser, Francesco Fambri, Maurizio Tavelli et al.

In this paper we discuss a new and very efficient implementation of high order accurate ADER discontinuous Galerkin (ADER-DG) finite element schemes on modern massively parallel supercomputers. The numerical methods apply to a very broad class of nonlinear systems of hyperbolic partial differential equations. ADER-DG schemes are by construction communication avoiding and cache blocking and are furthermore very well-suited for vectorization, so that they appear to be a good candidate for the future generation of exascale supercomputers. We introduce the numerical algorithm and show some applications to a set of hyperbolic equations with increasing level of complexity, ranging from the compressible Euler equations over the equations of linear elasticity and the unified Godunov-Peshkov-Romenski (GPR) model of continuum mechanics to general relativistic magnetohydrodynamics (GRMHD) and the Einstein field equations of general relativity. We present strong scaling results of the new ADER-DG schemes up to 180,000 CPU cores. To our knowledge, these are the largest runs ever carried out with high order ADER-DG schemes for nonlinear hyperbolic PDE systems. We also provide a detailed performance comparison with traditional Runge-Kutta DG schemes.

1.2MSJan 26, 2018

Stop talking to me -- a communication-avoiding ADER-DG realisation

Dominic E. Charrier, Tobias Weinzierl

We present a communication- and data-sensitive formulation of ADER-DG for hyperbolic differential equation systems. Sensitive here has multiple flavours: First, the formulation reduces the persistent memory footprint. This reduces pressure on the memory subsystem. Second, the formulation realises the underlying predictor-corrector scheme with single-touch semantics, i.e., each degree of freedom is read on average only once per time step from the main memory. This reduces communication through the memory controllers. Third, the formulation breaks up the tight coupling of the explicit time stepping's algorithmic steps to mesh traversals. This averages out data access peaks. Different operations and algorithmic steps are ran on different grid entities. Finally, the formulation hides distributed memory data transfer behind the computation aligned with the mesh traversal. This reduces pressure on the machine interconnects. All techniques applied by our formulation are elaborated by means of a rigorous task formalism. They break up ADER-DG's tight causal coupling of compute steps and can be generalised to other predictor-corrector schemes.

7.5DCJun 22

Memory Layouts for GPU-Data Transfer Buffering in SPH

Mladen Ivkovic, Abouzied M. A. Nasar, Tobias Weinzierl et al.

The rise in GPU compute speed has outpaced improvements in host-to-device memory transfer speeds, despite the advent of shared-memory superchips. Consequently, memory transfer times now constitute an increasingly large fraction of total time-to-solution, compelling developers to compress GPU kernel input and output data into compact, minimal formats prior to GPU-offloading. This complements existing work on GPU- and compute-friendly data arrangements. We study a Smoothed Particle Hydrodynamics solver and propose memory layout strategies for host-side particle data that are particularly well-suited to GPU-offloading. Specifically, we advocate splitting classic array-of-struct data structures into a split array-of-struct arrangement, in which each logical struct decomposes into substructs determined by kernel read/write access patterns and attribute types. Splitting a monolithic particle struct into several bespoke, finer-grained structs can reduce the time required to pack data to and from buffers by ~20% - 40%, lowering total time spent on GPU-offloading by ~12% - 25%.

3.6SEOct 18, 2021

Doubt and Redundancy Kill Soft Errors -- Towards Detection and Correction of Silent Data Corruption in Task-based Numerical Software

Philipp Samfass, Tobias Weinzierl, Anne Reinarz et al.

Resilient algorithms in high-performance computing are subject to rigorous non-functional constraints. Resiliency must not increase the runtime, memory footprint or I/O demands too significantly. We propose a task-based soft error detection scheme that relies on error criteria per task outcome. They formalise how ``dubious'' an outcome is, i.e. how likely it contains an error. Our whole simulation is replicated once, forming two teams of MPI ranks that share their task results. Thus, ideally each team handles only around half of the workload. If a task yields large error criteria values, i.e.~is dubious, we compute the task redundantly and compare the outcomes. Whenever they disagree, the task result with a lower error likeliness is accepted. We obtain a self-healing, resilient algorithm which can compensate silent floating-point errors without a significant performance, I/O or memory footprint penalty. Case studies however suggest that a careful, domain-specific tailoring of the error criteria remains essential.