Andreas Klöckner

NA
14papers
1,000citations
Novelty43%
AI Score38

14 Papers

NAApr 3, 2009
Nodal Discontinuous Galerkin Methods on Graphics Processors

Andreas Klöckner, Tim Warburton, Jeffrey Bridge et al.

Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in an element-local way, with weak penalty-based element-to-element coupling. The resulting locality in memory access is one of the factors that enables DG to run on off-the-shelf, massively parallel graphics processors (GPUs). In addition, DG's high-order nature lets it require fewer data points per represented wavelength and hence fewer memory accesses, in exchange for higher arithmetic intensity. Both of these factors work significantly in favor of a GPU implementation of DG. Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for Maxwell's equations on a general 3D unstructured grid by a factor of 40 to 60 relative to a serial computation on a current-generation CPU. In many cases, our algorithms exhibit full use of the device's available memory bandwidth. Example computations achieve and surpass 200 gigaflops/s of net application-level floating point work. In this article, we describe and derive the techniques used to reach this level of performance. In addition, we present comprehensive data on the accuracy and runtime behavior of the method.

NAMar 17, 2013
Quadrature by Expansion: A New Method for the Evaluation of Layer Potentials

Andreas Klöckner, Alexander Barnett, Leslie Greengard et al.

Integral equation methods for the solution of partial differential equations, when coupled with suitable fast algorithms, yield geometrically flexible, asymptotically optimal and well-conditioned schemes in either interior or exterior domains. The practical application of these methods, however, requires the accurate evaluation of boundary integrals with singular, weakly singular or nearly singular kernels. Historically, these issues have been handled either by low-order product integration rules (computed semi-analytically), by singularity subtraction/cancellation, by kernel regularization and asymptotic analysis, or by the construction of special purpose "generalized Gaussian quadrature" rules. In this paper, we present a systematic, high-order approach that works for any singularity (including hypersingular kernels), based only on the assumption that the field induced by the integral operator is locally smooth when restricted to either the interior or the exterior. Discontinuities in the field across the boundary are permitted. The scheme, denoted QBX (quadrature by expansion), is easy to implement and compatible with fast hierarchical algorithms such as the fast multipole method. We include accuracy tests for a variety of integral operators in two dimensions on smooth and corner domains.

NAMar 18, 2011
Viscous Shock Capturing in a Time-Explicit Discontinuous Galerkin Method

Andreas Klöckner, Tim Warburton, Jan S. Hesthaven

We present a novel, cell-local shock detector for use with discontinuous Galerkin (DG) methods. The output of this detector is a reliably scaled, element-wise smoothness estimate which is suited as a control input to a shock capture mechanism. Using an artificial viscosity in the latter role, we obtain a DG scheme for the numerical solution of nonlinear systems of conservation laws. Building on work by Persson and Peraire, we thoroughly justify the detector's design and analyze its performance on a number of benchmark problems. We further explain the scaling and smoothing steps necessary to turn the output of the detector into a local, artificial viscosity. We close by providing an extensive array of numerical tests of the detector in use.

NAApr 22, 2013
On the convergence of local expansions of layer potentials

Charles L. Epstein, Leslie Greengard, Andreas Klöckner

In a recently developed quadrature method (quadrature by expansion or QBX), it was demonstrated that weakly singular or singular layer potentials can be evaluated rapidly and accurately on surface by making use of local expansions about carefully chosen off-surface points. In this paper, we derive estimates for the rate of convergence of these local expansions, providing the analytic foundation for the QBX method. The estimates may also be of mathematical interest, particularly for microlocal or asymptotic analysis in potential theory.

MSNov 2, 2012
High-Order Discontinuous Galerkin Methods by GPU Metaprogramming

Andreas Klöckner, Timothy Warburton, Jan S. Hesthaven

Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, massively parallel graphics processors (GPUs). A number of qualities of the method contribute to this suitability, reaching from locality of reference, through regularity of access patterns, to high arithmetic intensity. In this article, we illuminate a few of the more practical aspects of bringing DG onto a GPU, including the use of a Python-based metaprogramming infrastructure that was created specifically to support DG, but has found many uses across all disciplines of computational science.

CLASS-PHMar 18, 2012
A consistency condition for the vector potential in multiply-connected domains

Charles L. Epstein, Zydrunas Gimbutas, Leslie Greengard et al.

A classical problem in electromagnetics concerns the representation of the electric and magnetic fields in the low-frequency or static regime, where topology plays a fundamental role. For multiply connected conductors, at zero frequency the standard boundary conditions on the tangential components of the magnetic field do not uniquely determine the vector potential. We describe a (gauge-invariant) consistency condition that overcomes this non-uniqueness and resolves a longstanding difficulty in inverting the magnetic field integral equation.

NAFeb 22, 2017
Fast algorithms for Quadrature by Expansion I: Globally valid expansions

Manas Rachh, Andreas Klöckner, Michael O'Neil

The use of integral equation methods for the efficient numerical solution of PDE boundary value problems requires two main tools: quadrature rules for the evaluation of layer potential integral operators with singular kernels, and fast algorithms for solving the resulting dense linear systems. Classically, these tools were developed separately. In this work, we present a unified numerical scheme based on coupling Quadrature by Expansion, a recent quadrature method, to a customized Fast Multipole Method (FMM) for the Helmholtz equation in two dimensions. The method allows the evaluation of layer potentials in linear-time complexity, anywhere in space, with a uniform, user-chosen level of accuracy as a black-box computational method. Providing this capability requires geometric and algorithmic considerations beyond the needs of standard FMMs as well as careful consideration of the accuracy of multipole translations. We illustrate the speed and accuracy of our method with various numerical examples. Keywords: Layer Potentials; Singular Integrals; Quadrature; High-order accuracy; Integral equations; Helmholtz equation; Fast multipole method.

PLApr 13, 2016
Array Program Transformation with Loo.py by Example: High-Order Finite Elements

Andreas Klöckner, Lucas C. Wilcox, T. Warburton

To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance computational kernel suitable for execution on modern GPU hardware. Along the transformation path, we encounter kernel fusion, vectorization, prefetch- ing, parallelization, and algorithmic changes achieved by mechanized conversion between imperative and functional/substitution- based code, among a number more. We conclude with performance results that demonstrate the effects and support the effectiveness of the applied transformations.

NAAug 16, 2018
High-order Finite Element--Integral Equation Coupling on Embedded Meshes

Natalie N. Beams, Andreas Klöckner, Luke N. Olson

This paper presents a high-order method for solving an interface problem for the Poisson equation on embedded meshes through a coupled finite element and integral equation approach. The method is capable of handling homogeneous or inhomogeneous jump conditions without modification and retains high-order convergence close to the embedded interface. We present finite element-integral equation (FE-IE) formulations for interior, exterior, and interface problems. The treatments of the exterior and interface problems are new. The resulting linear systems are solved through an iterative approach exploiting the second-kind nature of the IE operator combined with algebraic multigrid preconditioning for the FE part. Assuming smooth continuations of coefficients and right-hand-side data, we show error analysis supporting high-order accuracy. Numerical evidence further supports our claims of efficiency and high-order accuracy for smooth data.

NANov 2, 2018
Conformal Mapping via a Density Correspondence for the Double-Layer Potential

Matt Wala, Andreas Klöckner

We derive a representation formula for harmonic polynomials and Laurent polynomials in terms of densities of the double-layer potential on bounded piecewise smooth and simply connected domains. From this result, we obtain a method for the numerical computation of conformal maps that applies to both exterior and interior regions. We present analysis and numerical experiments supporting the accuracy and broad applicability of the method.

NANov 12, 2025
A Fast Direct Solver for Boundary Integral Equations Using Quadrature By Expansion

Alexandru Fikl, Andreas Klöckner

We construct and analyze a hierarchical direct solver for linear systems arising from the discretization of boundary integral equations using the Quadrature by Expansion (QBX) method. Our scheme builds on the existing theory of Hierarchical Semi-Separable (HSS) matrix operators that contain low-rank off-diagonal submatrices. We use proxy-based approximations of the far-field interactions and the Interpolative Decomposition (ID) to construct compressed HSS operators that are used as fast direct solvers for the original system. We describe a number of modifications to the standard HSS framework that enable compatibility with the QBX family of discretization methods. We establish an error model for the direct solver that is based on a multipole expansion of the QBX-mediated proxy interactions and standard estimates for the ID. Based on these theoretical results, we develop an automatic approach for setting scheme parameters based on user-provided error tolerances. The resulting solver seamlessly generalizes across two- and tree-dimensional problems and achieves state-of-the-art asymptotic scaling. We conclude with numerical experiments that support the theoretical expectations for the error and computational cost of the direct solver.

NAMar 29, 2019
A Fast Algorithm for Quadrature by Expansion in Three Dimensions

Matt Wala, Andreas Klöckner

This paper presents an accelerated quadrature scheme for the evaluation of layer potentials in three dimensions. Our scheme combines a generic, high order quadrature method for singular kernels called Quadrature by Expansion (QBX) with a modified version of the Fast Multipole Method (FMM). Our scheme extends a recently developed formulation of the FMM for QBX in two dimensions, which, in that setting, achieves mathematically rigorous error and running time bounds. In addition to generalization to three dimensions, we highlight some algorithmic and mathematical opportunities for improved performance and stability. Lastly, we give numerical evidence supporting the accuracy, performance, and scalability of the algorithm through a series of experiments involving the Laplace and Helmholtz equations.

SEApr 19, 2013
GPU Scripting and Code Generation with PyCUDA

Andreas Klöckner, Nicolas Pinto, Bryan Catanzaro et al.

High-level scripting languages are in many ways polar opposites to GPUs. GPUs are highly parallel, subject to hardware subtleties, and designed for maximum throughput, and they offer a tremendous advance in the performance achievable for a significant number of computational problems. On the other hand, scripting languages such as Python favor ease of use over computational speed and do not generally emphasize parallelism. PyCUDA is a package that attempts to join the two together. This chapter argues that in doing so, a programming environment is created that is greater than just the sum of its two parts. We would like to note that nearly all of this chapter applies in unmodified form to PyOpenCL, a sister project of PyCUDA, whose goal it is to realize the same concepts as PyCUDA for OpenCL.

NANov 18, 2009
Deterministic Numerical Schemes for the Boltzmann Equation

Akil Narayan, Andreas Klöckner

This article describes methods for the deterministic simulation of the collisional Boltzmann equation. It presumes that the transport and collision parts of the equation are to be simulated separately in the time domain. Time stepping schemes to achieve the splitting as well as numerical methods for each part of the operator are reviewed, with an emphasis on clearly exposing the challenges posed by the equation as well as their resolution by various schemes.