Mathias Jacquelin

2papers

2 Papers

DCOct 26, 2016
The Reverse Cuthill-McKee Algorithm in Distributed-Memory

Ariful Azad, Mathias Jacquelin, Aydin Buluc et al.

Ordering vertices of a graph is key to minimize fill-in and data structure size in sparse direct solvers, maximize locality in iterative solvers, and improve performance in graph algorithms. Except for naturally parallelizable ordering methods such as nested dissection, many important ordering methods have not been efficiently mapped to distributed-memory architectures. In this paper, we present the first-ever distributed-memory implementation of the reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse matrix. Our parallelization uses a two-dimensional sparse matrix decomposition. We achieve high performance by decomposing the problem into a small number of primitives and utilizing optimized implementations of these primitives. Our implementation shows strong scaling up to 1024 cores for smaller matrices and up to 4096 cores for larger matrices.

MSAug 14, 2017
PSelInv - A Distributed Memory Parallel Algorithm for Selected Inversion: the non-symmetric Case

Mathias Jacquelin, Lin Lin, Chao Yang

This paper generalizes the parallel selected inversion algorithm called PSelInv to sparse non- symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L, U are lower and upper triangular matrices, and P, Q are permutation matrices, respectively. The PSelInv method computes selected elements of A-1. The selection is confined by the sparsity pattern of the matrix AT . Our algorithm does not assume any symmetry properties of A, and our parallel implementation is memory efficient, in the sense that the computed elements of A-T overwrites the sparse matrix L+U in situ. PSelInv involves a large number of collective data communication activities within different processor groups of various sizes. In order to minimize idle time and improve load balancing, tree-based asynchronous communication is used to coordinate all such collective communication. Numerical results demonstrate that PSelInv can scale efficiently to 6,400 cores for a variety of matrices.