Gustavo Chávez

NA
4papers
25citations
Novelty43%
AI Score21

4 Papers

APJan 4, 2018
Accelerated Cyclic Reduction: A Distributed-Memory Fast Solver for Structured Linear Systems

Gustavo Chávez, George Turkiyyah, Stefano Zampini et al.

We present Accelerated Cyclic Reduction (ACR), a distributed-memory fast direct solver for rank-compressible block tridiagonal linear systems arising from the discretization of elliptic operators, developed here for three dimensions. Algorithmic synergies between Cyclic Reduction and hierarchical matrix arithmetic operations result in a solver that has $O(k~N \log N~(\log N + k^2))$ arithmetic complexity and $O(k~N \log N)$ memory footprint, where $N$ is the number of degrees of freedom and $k$ is the rank of a typical off-diagonal block, and which exhibits substantial concurrency. We provide a baseline for performance and applicability by comparing with the multifrontal method where hierarchical semi-separable matrices are used for compressing the fronts, and with algebraic multigrid. Over a set of large-scale elliptic systems with features of nonsymmetry and indefiniteness, the robustness of the direct solvers extends beyond that of the multigrid solver, and relative to the multifrontal approach ACR has lower or comparable execution time and memory footprint. ACR exhibits good strong and weak scaling in a distributed context and, as with any direct solver, is advantageous for problems that require the solution of multiple right-hand sides.

NADec 24, 2017
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

Gustavo Chávez, George Turkiyyah, Stefano Zampini et al.

We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.

NAApr 3, 2016
A Direct Elliptic Solver Based on Hierarchically Low-rank Schur Complements

Gustavo Chávez, George Turkiyyah, David Keyes

A parallel fast direct solver for rank-compressible block tridiagonal linear systems is presented. Algorithmic synergies between Cyclic Reduction and Hierarchical matrix arithmetic operations result in a solver with $O(N \log^2 N)$ arithmetic complexity and $O(N \log N)$ memory footprint. We provide a baseline for performance and applicability by comparing with well known implementations of the $\mathcal{H}$-LU factorization and algebraic multigrid with a parallel implementation that leverages the concurrency features of the method. Numerical experiments reveal that this method is comparable with other fast direct solvers based on Hierarchical Matrices such as $\mathcal{H}$-LU and that it can tackle problems where algebraic multigrid fails to converge.

NAOct 11, 2018
Matrix-free construction of HSS representation using adaptive randomized sampling

Christopher Gorman, Gustavo Chávez, Pieter Ghysels et al.

We present new algorithms for the randomized construction of hierarchically semi-separable matrices, addressing several practical issues. The HSS construction algorithms use a partially matrix-free, adaptive randomized projection scheme to determine the maximum off-diagonal block rank. We develop both relative and absolute stopping criteria to determine the minimum dimension of the random projection matrix that is sufficient for the desired accuracy. Two strategies are discussed to adaptively enlarge the random sample matrix: repeated doubling of the number of random vectors, and iteratively incrementing the number of random vectors by a fixed number. The relative and absolute stopping criteria are based on probabilistic bounds for the Frobenius norm of the random projection of the Hankel blocks of the input matrix. We discuss parallel implementation and computation and communication cost of both variants. Parallel numerical results for a range of applications, including boundary element method matrices and quantum chemistry Toeplitz matrices, show the effectiveness, scalability and numerical robustness of the proposed algorithms.