Luke N. Olson

6papers

107citations

Novelty54%

AI Score26

Ranked #166,264 of 205,806 authors (top 81%)#1,519 in NA (top 46%)

6 Papers

DCDec 15, 2015

Reducing Parallel Communication in Algebraic Multigrid through Sparsification

Amanda Bienz, Robert D. Falgout William Gropp, Luke N. Olson et al.

Algebraic multigrid (AMG) is an $\mathcal{O}(n)$ solution process for many large sparse linear systems. A hierarchy of progressively coarser grids is constructed that utilize complementary relaxation and interpolation operators. High-energy error is reduced by relaxation, while low-energy error is mapped to coarse-grids and reduced there. However, large parallel communication costs often limit parallel scalability. As the multigrid hierarchy is formed, each coarse matrix is formed through a triple matrix product. The resulting coarse-grids often have significantly more nonzeros per row than the original fine-grid operator, thereby generating high parallel communication costs on coarse-levels. In this paper, we introduce a method that systematically removes entries in coarse-grid matrices after the hierarchy is formed, leading to an improved communication costs. We sparsify by removing weakly connected or unimportant entries in the matrix, leading to improved solve time. The main trade-off is that if the heuristic identifying unimportant entries is used too aggressively, then AMG convergence can suffer. To counteract this, the original hierarchy is retained, allowing entries to be reintroduced into the solver hierarchy if convergence is too slow. This enables a balance between communication cost and convergence, as necessary. In this paper we present new algorithms for reducing communication and present a number of computational experiments in support.

NAJan 29, 2018

A Root-Node Based Algebraic Multigrid Method

Thomas A. Manteuffel, Luke N. Olson, Jacob B. Schroder et al.

This paper provides a unified and detailed presentation of root-node style algebraic multigrid (AMG). Algebraic multigrid is a popular and effective iterative method for solving large, sparse linear systems that arise from discretizing partial differential equations. However, while AMG is designed for symmetric positive definite matrices (SPD), certain SPD problems, such as anisotropic diffusion, are still not adequately addressed by existing methods. Non-SPD problems pose an even greater challenge, and in practice AMG is often not considered as a solver for such problems. The focus of this paper is on so-called root-node AMG, which can be viewed as a combination of classical and aggregation-based multigrid. An algorithm for root-node is outlined and a filtering strategy is developed, which is able to control the cost of using root-node AMG, particularly on difficult problems. New theoretical motivation is provided for root-node and energy-minimization as applied to symmetric as well non-symmetric systems. Numerical results are then presented demonstrating the robust ability of root-node to solve non-symmetric problems, systems-based problems, and difficult SPD problems, including strongly anisotropic diffusion, convection-diffusion, and upwind steady-state transport, in a scalable manner. New, detailed estimates of the computational cost of the setup and solve phase are given for each example, providing additional support for root-node AMG over alternative methods.

LGDec 10, 2022

Optimized Sparse Matrix Operations for Reverse Mode Automatic Differentiation

Nicolas Nytko, Ali Taghibakhshi, Tareq Uz Zaman et al.

Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration missing. In this work, we present an implementation of a CSR-based sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.

NAAug 16, 2018

High-order Finite Element--Integral Equation Coupling on Embedded Meshes

Natalie N. Beams, Andreas Klöckner, Luke N. Olson

This paper presents a high-order method for solving an interface problem for the Poisson equation on embedded meshes through a coupled finite element and integral equation approach. The method is capable of handling homogeneous or inhomogeneous jump conditions without modification and retains high-order convergence close to the embedded interface. We present finite element-integral equation (FE-IE) formulations for interior, exterior, and interface problems. The treatments of the exterior and interface problems are new. The resulting linear systems are solved through an iterative approach exploiting the second-kind nature of the IE operator combined with algebraic multigrid preconditioning for the FE part. Assuming smooth continuations of coefficients and right-hand-side data, we show error analysis supporting high-order accuracy. Numerical evidence further supports our claims of efficiency and high-order accuracy for smooth data.

MSMar 6, 2018

Scaling Structured Multigrid to 500K+ Cores through Coarse-Grid Redistribution

Andrew Reisner, Luke N. Olson, J. David Moulton

The efficient solution of sparse, linear systems resulting from the discretization of partial differential equations is crucial to the performance of many physics-based simulations. The algorithmic optimality of multilevel approaches for common discretizations makes them a good candidate for an efficient parallel solver. Yet, modern architectures for high-performance computing systems continue to challenge the parallel scalability of multilevel solvers. While algebraic multigrid methods are robust for solving a variety of problems, the increasing importance of data locality and cost of data movement in modern architectures motivates the need to carefully exploit structure in the problem. Robust logically structured variational multigrid methods, such as Black Box Multigrid (BoxMG), maintain structure throughout the multigrid hierarchy. This avoids indirection and increased coarse-grid communication costs typical in parallel algebraic multigrid. Nevertheless, the parallel scalability of structured multigrid is challenged by coarse-grid problems where the overhead in communication dominates computation. In this paper, an algorithm is introduced for redistributing coarse-grid problems through incremental agglomeration. Guided by a predictive performance model, this algorithm provides robust redistribution decisions for structured multilevel solvers. A two-dimensional diffusion problem is used to demonstrate the significant gain in performance of this algorithm over the previous approach that used agglomeration to one processor. In addition, the parallel scalability of this approach is demonstrated on two large-scale computing systems, with solves on up to 500K+ cores.

NAMar 30, 2015

A Finite Element Based P3M Method for N-body Problems

Natalie N. Beams, Luke N. Olson, Jonathan B. Freund

We introduce a fast mesh-based method for computing N-body interactions that is both scalable and accurate. The method is founded on a particle-particle--particle-mesh P3M approach, which decomposes a potential into rapidly decaying short-range interactions and smooth, mesh-resolvable long-range interactions. However, in contrast to the traditional approach of using Gaussian screen functions to accomplish this decomposition, our method employs specially designed polynomial bases to construct the screened potentials. Because of this form of the screen, the long-range component of the potential is then solved exactly with a finite element method, leading ultimately to a sparse matrix problem that is solved efficiently with standard multigrid methods. Moreover, since this system represents an exact discretization, the optimal resolution properties of the FFT are unnecessary, though the short-range calculation is now more involved than P3M/PME methods. We introduce the method, analyze its key properties, and demonstrate the accuracy of the algorithm.