Lucas C. Wilcox

NA
7papers
123citations
Novelty31%
AI Score37

7 Papers

NAFeb 13, 2017
Acceleration of the Implicit-Explicit Non-hydrostatic Unified Model of the Atmosphere (NUMA) on Manycore Processors

Daniel S. Abdi, Francis X. Giraldo, Emil M. Constantinescu et al.

We present the acceleration of an IMplicit-EXplicit (IMEX) non-hydrostatic atmospheric model on manycore processors such as GPUs and Intel's MIC architecture. IMEX time integration methods sidestep the constraint imposed by the Courant-Friedrichs-Lewy condition on explicit methods through corrective implicit solves within each time step. In this work, we implement and evaluate the performance of IMEX on manycore processors relative to explicit methods. Using 3D-IMEX at Courant number C=15 , we obtained a speedup of about 4X relative to an explicit time stepping method run with the maximum allowable C=1. In addition, we demonstrate a much larger speedup of 100X at C=150 using 1D-IMEX due to the unconditional stability of the method in the vertical direction. Several improvements on the IMEX procedure were necessary in order to outperform our results with explicit methods: a) reducing the number of degrees of freedom of the IMEX formulation by forming the Schur complement; b) formulating a horizontally-explicit vertically-implicit (HEVI) 1D-IMEX scheme that has a lower workload and potentially better scalability than 3D-IMEX; c) using high-order polynomial preconditioners to reduce the condition number of the resulting system; d) using a direct solver for the 1D-IMEX method by performing and storing LU factorizations once to obtain a constant cost for any Courant number. Without all of these improvements, explicit time integration methods turned out to be difficult to beat. We discuss in detail the IMEX infrastructure required for formulating and implementing efficient methods on manycore processors. Finally, we validate our results with standard benchmark problems in NWP and evaluate the performance and scalability of the IMEX method using up to 4192 GPUs and 16 Knights Landing processors.

NAJun 13, 2018
Discretely entropy stable weight-adjusted discontinuous Galerkin methods on curvilinear meshes

Jesse Chan, Lucas C. Wilcox

We construct entropy conservative and entropy stable high order accurate discontinuous Galerkin (DG) discretizations for time-dependent nonlinear hyperbolic conservation laws on curvilinear meshes. The resulting schemes preserve a semi-discrete quadrature approximation of a continuous global entropy inequality. The proof requires the satisfaction of a discrete geometric conservation law, which we enforce through an appropriate polynomial approximation. We extend the construction of entropy conservative and entropy stable DG schemes to the case when high order accurate curvilinear mass matrices are approximated using low-storage weight-adjusted approximations, and describe how to retain global conservation properties under such an approximation. The theoretical results are verified through numerical experiments for the compressible Euler equations on triangular and tetrahedral meshes.

NAFeb 10, 2017
Solving 1D Conservation Laws Using Pontryagin's Minimum Principle

Wei Kang, Lucas C. Wilcox

This paper discusses a connection between scalar convex conservation laws and Pontryagin's minimum principle. For flux functions for which an associated optimal control problem can be found, a minimum value solution of the conservation law is proposed. For scalar space-independent convex conservation laws such a control problem exists and the minimum value solution of the conservation law is equivalent to the entropy solution. This can be seen as a generalization of the Lax--Oleinik formula to convex (not necessarily uniformly convex) flux functions. Using Pontryagin's minimum principle, an algorithm for finding the minimum value solution pointwise of scalar convex conservation laws is given. Numerical examples of approximating the solution of both space-dependent and space-independent conservation laws are provided to demonstrate the accuracy and applicability of the proposed algorithm. Furthermore, a MATLAB routine using Chebfun is provided (along with demonstration code on how to use it) to approximately solve scalar convex conservation laws with space-independent flux functions.

PLApr 13, 2016
Array Program Transformation with Loo.py by Example: High-Order Finite Elements

Andreas Klöckner, Lucas C. Wilcox, T. Warburton

To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance computational kernel suitable for execution on modern GPU hardware. Along the transformation path, we encounter kernel fusion, vectorization, prefetch- ing, parallelization, and algorithmic changes achieved by mechanized conversion between imperative and functional/substitution- based code, among a number more. We conclude with performance results that demonstrate the effects and support the effectiveness of the applied transformations.

2.9NAMay 15
GPU Performance of an Entropy-Stable Discontinuous Galerkin Euler Solver with Non-Conservative Terms

Henry Waterhouse, Maciej Waruszewski, Lucas C. Wilcox et al.

The entropy-stable discontinuous Galerkin method for compressible Euler equations with buoyancy is implemented on graphics processing unit (GPU) hardware. We measure the performance of the solver on three-dimensional problems: the rising thermal bubble and the baroclinic instability in a channel. On NVIDIA A100 hardware, the solver achieves nearly 70\% of 64-bit floating-point peak performance for the most computationally expensive kernel (volume terms) and significantly reduces the computational overhead typically incurred by two point entropy-stable fluxes in the volume terms. We also present impressive strong and weak scaling performance of the solver and compare to a highly-optimized central processing unit (CPU) code showing that the GPU kernels are a factor of $10\times$ faster and better than $13\times$ more energy efficient than the CPU code. We also show that the solver achieves the expected $2\times$ speedup when run at 32-bit floating-point peak performance. We discuss the different modifications that we implemented to reach the final form of the GPU implementation and measure the performance gain of each of the implementation strategies ranging from reduction in complex operations and memory traffic as well as load balancing. We also extend symmetry-based flux savings to the non-symmetric gravity term, preserving nearly the full factor-of-two speedup achieved for the symmetric flux.

DCNov 5, 2015
Strong Scaling for Numerical Weather Prediction at Petascale with the Atmospheric Model NUMA

Andreas Müller, Michal A. Kopera, Simone Marras et al.

Numerical weather prediction (NWP) has proven to be computationally challenging due to its inherent multiscale nature. Currently, the highest resolution NWP models use a horizontal resolution of about 10km. In order to increase the resolution of NWP models highly scalable atmospheric models are needed. The Non-hydrostatic Unified Model of the Atmosphere (NUMA), developed by the authors at the Naval Postgraduate School, was designed to achieve this purpose. NUMA is used by the Naval Research Laboratory, Monterey as the engine inside its next generation weather prediction system NEPTUNE. NUMA solves the fully compressible Navier-Stokes equations by means of high-order Galerkin methods (both spectral element as well as discontinuous Galerkin methods can be used). Mesh generation is done using the p4est library. NUMA is capable of running middle and upper atmosphere simulations since it does not make use of the shallow-atmosphere approximation. This paper presents the performance analysis and optimization of the spectral element version of NUMA. The performance at different optimization stages is analyzed using a theoretical performance model as well as measurements via hardware counters. Machine independent optimization is compared to machine specific optimization using BG/Q vector intrinsics. By using vector intrinsics the main computations reach 1.2 PFlops on the entire machine Mira (12% of the theoretical peak performance). The paper also presents scalability studies for two idealized test cases that are relevant for NWP applications. The atmospheric model NUMA delivers an excellent strong scaling efficiency of 99% on the entire supercomputer Mira using a mesh with 1.8 billion grid points. This allows to run a global forecast of a baroclinic wave test case at 3km uniform horizontal resolution and double precision within the time frame required for operational weather prediction.

NASep 28, 2015
Stable Coupling of Nonconforming, High-Order Finite Difference Methods

Jeremy E. Kozdon, Lucas C. Wilcox

A methodology for handling block-to-block coupling of nonconforming, multiblock summation-by-parts finite difference methods is proposed. The coupling is based on the construction of projection operators that move a finite difference grid solution along an interface to a space of piecewise defined functions; we specifically consider discontinuous, piecewise polynomial functions. The constructed projection operators are compatible with the underlying summation-by-parts energy norm. Using the linear wave equation in two dimensions as a model problem, energy stability of the coupled numerical method is proven for the case of curved, nonconforming block-to-block interfaces. To further demonstrate the power of the coupling procedure, we show how it allows for the development of a provably energy stable coupling between curvilinear finite difference methods and a curved-triangle discontinuous Galerkin method. The theoretical results are verified through numerical simulations on curved meshes as well as eigenvalue analysis.