Siegfried Cools

NA
8papers
93citations
Novelty44%
AI Score22

8 Papers

NANov 29, 2017
Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method

Siegfried Cools, Emrullah Fatih Yetkin, Emmanuel Agullo et al.

Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipelined methods introduce additional recurrence relations for a number of auxiliary variables that are required to update the approximate solution. This paper aims at studying the influence of local rounding errors that are introduced by the additional recurrences in the pipelined Conjugate Gradient method. Specifically, we analyze the impact of local round-off effects on the attainable accuracy of the pipelined CG algorithm and compare to the traditional CG method. Furthermore, we estimate the gap between the true residual and the recursively computed residual used in the algorithm. Based on this estimate we suggest an automated residual replacement strategy to reduce the loss of attainable accuracy on the final iterative solution. The resulting pipelined CG method with residual replacement improves the maximal attainable accuracy of pipelined CG, while maintaining the efficient parallel performance of the pipelined method. This conclusion is substantiated by numerical results for a variety of benchmark problems.

NASep 6, 2013
A new level-dependent coarsegrid correction scheme for indefinite Helmholtz problems

Siegfried Cools, Bram Reps, Wim Vanroose

In this paper we construct and analyse a level-dependent coarsegrid correction scheme for indefinite Helmholtz problems. This adapted multigrid method is capable of solving the Helmholtz equation on the finest grid using a series of multigrid cycles with a grid-dependent complex shift, leading to a stable correction scheme on all levels. It is rigourously shown that the adaptation of the complex shift throughout the multigrid cycle maintains the functionality of the two-grid correction scheme, as no smooth modes are amplified in or added to the error. In addition, a sufficiently smoothing relaxation scheme should be applied to ensure damping of the oscillatory error components. Numerical experiments on various benchmark problems show the method to be competitive with or even outperform the current state-of-the-art multigrid-preconditioned Krylov methods, like e.g. CSL-preconditioned GMRES or BiCGStab.

NAJan 21, 2015
A multi-level preconditioned Krylov method for the efficient solution of algebraic tomographic reconstruction problems

Siegfried Cools, Pieter Ghysels, Wim van Aarle et al.

Classical iterative methods for tomographic reconstruction include the class of Algebraic Reconstruction Techniques (ART). Convergence of these stationary linear iterative methods is however notably slow. In this paper we propose the use of Krylov solvers for tomographic linear inversion problems. These advanced iterative methods feature fast convergence at the expense of a higher computational cost per iteration, causing them to be generally uncompetitive without the inclusion of a suitable preconditioner. Combining elements from standard multigrid (MG) solvers and the theory of wavelets, a novel wavelet-based multi-level (WMG) preconditioner is introduced, which is shown to significantly speed-up Krylov convergence. The performance of the WMG-preconditioned Krylov method is analyzed through a spectral analysis, and the approach is compared to existing methods like the classical Simultaneous Iterative Reconstruction Technique (SIRT) and unpreconditioned Krylov methods on a 2D tomographic benchmark problem. Numerical experiments are promising, showing the method to be competitive with the classical Algebraic Reconstruction Techniques in terms of convergence speed and overall performance (CPU time) as well as precision of the reconstruction.

APDec 23, 2016
On the optimality of shifted Laplacian in the class of expansion preconditioners for the Helmholtz equation

Siegfried Cools, Wim Vanroose

This paper introduces and explores the class of expansion preconditioners EX(m) that forms a direct generalization to the classic complex shifted Laplace (CSL) preconditioner for Helmholtz problems. The construction of the EX(m) preconditioner is based upon a truncated Taylor series expansion of the original Helmholtz operator inverse. The expansion preconditioner is shown to significantly improve Krylov solver convergence rates for the Helmholtz problem for growing values of the number of series terms m. However, the addition of multiple terms in the expansion also increases the computational cost of applying the preconditioner. A thorough cost-benefit analysis of the addition of extra terms in the EX(m) preconditioner proves that the CSL or EX(1) preconditioner is the practically most efficient member of the expansion preconditioner class. Additionally, possible extensions to the expansion preconditioner class that further increase preconditioner efficiency are suggested.

NAMar 25, 2019
Analyzing and improving maximal attainable accuracy in the communication hiding pipelined BiCGStab method

Siegfried Cools

Pipelined Krylov subspace methods avoid communication latency by reducing the number of global synchronization bottlenecks and by hiding global communication behind useful computational work. In exact arithmetic pipelined Krylov subspace algorithms are equivalent to classic Krylov subspace methods and generate identical series of iterates. However, as a consequence of the reformulation of the algorithm to improve parallelism, pipelined methods may suffer from severely reduced attainable accuracy in a practical finite precision setting. This work presents a numerical stability analysis that describes and quantifies the impact of local rounding error propagation on the maximal attainable accuracy of the multi-term recurrences in the preconditioned pipelined BiCGStab method. Theoretical expressions for the gaps between the true and computed residual as well as other auxiliary variables used in the algorithm are derived, and the elementary dependencies between the gaps on the various recursively computed vector variables are analyzed. The norms of the corresponding propagation matrices and vectors provide insights in the possible amplification of local rounding errors throughout the algorithm. Stability of the pipelined BiCGStab method is compared numerically to that of pipelined CG on a symmetric benchmark problem. Furthermore, numerical evidence supporting the effectiveness of employing a residual replacement type strategy to improve the maximal attainable accuracy for the pipelined BiCGStab method is provided.

NAAug 21, 2018
Numerical analysis of the maximal attainable accuracy in communication hiding pipelined Conjugate Gradient methods

Siegfried Cools

Krylov subspace methods are widely known as efficient algebraic methods for solving large scale linear systems. However, on massively parallel hardware the performance of these methods is typically limited by communication latency rather than floating point performance. With HPC hardware advancing towards the exascale regime the gap between computation and communication keeps steadily increasing, imposing the need for scalable alternatives to traditional Krylov subspace methods. One such approach are the so-called pipelined Krylov subspace methods, which reduce the number of global synchronization points and overlap global communication latency with local arithmetic operations, thus hiding the global reduction phases behind useful computations. To obtain this overlap the traditional Krylov subspace algorithm is reformulated by introducing a number of auxiliary vector quantities, which are computed using additional recurrence relations. Although pipelined Krylov subspace methods are equivalent to traditional Krylov subspace methods in exact arithmetic, local rounding errors induced by the multi-term recurrence relations in finite precision may in practice affect convergence significantly. This numerical stability study aims to characterize the effect of local rounding errors on attainable accuracy in various pipelined versions of the popular Conjugate Gradient method. Expressions for the gaps between the true and recursively computed variables that are used to update the search directions in the different CG variants are derived. Furthermore, it is shown how these results can be used to analyze and correct the effect of local rounding error propagation on the maximal attainable accuracy of pipelined CG methods. The analysis in this work is supplemented by numerical experiments that demonstrate the numerical behavior of the pipelined CG methods.

NAMay 15, 2019
Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method

Siegfried Cools, Jeffrey Cornelis, Wim Vanroose

Pipelined Krylov subspace methods (also referred to as communication-hiding methods) have been proposed in the literature as a scalable alternative to classic Krylov subspace algorithms for iteratively computing the solution to a large linear system in parallel. For symmetric and positive definite system matrices the pipelined Conjugate Gradient method outperforms its classic Conjugate Gradient counterpart on large scale distributed memory hardware by overlapping global communication with essential computations like the matrix-vector product, thus hiding global communication. A well-known drawback of the pipelining technique is the (possibly significant) loss of numerical stability. In this work a numerically stable variant of the pipelined Conjugate Gradient algorithm is presented that avoids the propagation of local rounding errors in the finite precision recurrence relations that construct the Krylov subspace basis. The multi-term recurrence relation for the basis vector is replaced by two-term recurrences, improving stability without increasing the overall computational cost of the algorithm. The proposed modification ensures that the pipelined Conjugate Gradient method is able to attain a highly accurate solution independently of the pipeline length. Numerical experiments demonstrate a combination of excellent parallel performance and improved maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm. This work thus resolves one of the major practical restrictions for the useability of pipelined Krylov subspace methods.

NASep 6, 2018
Numerically Stable Variants of the Communication-hiding Pipelined Conjugate Gradients Algorithm for the Parallel Solution of Large Scale Symmetric Linear Systems

Siegfried Cools, Wim Vanroose

By reducing the number of global synchronization bottlenecks per iteration and hiding communication behind useful computational work, pipelined Krylov subspace methods achieve significantly improved parallel scalability on present-day HPC hardware. However, this typically comes at the cost of a reduced maximal attainable accuracy. This paper presents and compares several stabilized versions of the communication-hiding pipelined Conjugate Gradients method. The main novel contribution of this work is the reformulation of the multi-term recurrence pipelined CG algorithm by introducing shifts in the recursions for specific auxiliary variables. These shifts reduce the amplification of local rounding errors on the residual. The stability analysis presented in this work provides a rigorous method for selection of the optimal shift value in practice. It is shown that, given a proper choice for the shift parameter, the resulting shifted pipelined CG algorithm restores the attainable accuracy and displays nearly identical robustness to local rounding error propagation compared to classical CG. Numerical results on a variety of SPD benchmark problems compare different stabilization techniques for the pipelined CG algorithm, showing that the shifted pipelined CG algorithm is able to attain a high accuracy while displaying excellent parallel performance.