Laura Grigori

h-index20

8papers

584citations

Novelty54%

AI Score27

Ranked #156,375 of 194,257 authors (top 80%)#975 in NA (top 40%)

8 Papers

10.8NAAug 19, 2008

Communication-optimal parallel and sequential QR and LU factorizations

James Demmel, Laura Grigori, Mark Hoemmen et al.

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. We prove optimality by extending known lower bounds on communication bandwidth for sequential and parallel matrix multiplication to provide latency lower bounds, and show these bounds apply to the LU and QR decompositions. We not only show that our QR algorithms attain these lower bounds (up to polylogarithmic factors), but that existing LAPACK and ScaLAPACK algorithms perform asymptotically more communication. We also point out recent LU algorithms in the literature that attain at least some of these lower bounds.

4.3NAAug 12, 2012

LU factorization with panel rank revealing pivoting and its communication avoiding version

Amal Khabou, James W. Demmel, Laura Grigori et al.

We present the LU decomposition with panel rank revealing pivoting (LU_PRRP), an LU factorization algorithm based on strong rank revealing QR panel factorization. LU_PRRP is more stable than Gaussian elimination with partial pivoting (GEPP). Our extensive numerical experiments show that the new factorization scheme is as numerically stable as GEPP in practice, but it is more resistant to pathological cases and easily solves the Wilkinson matrix and the Foster matrix. We also present CALU_PRRP, a communication avoiding version of LU_PRRP that minimizes communication. CALU_PRRP is based on tournament pivoting, with the selection of the pivots at each step of the tournament being performed via strong rank revealing QR factorization. CALU_PRRP is more stable than CALU, the communication avoiding version of GEPP. CALU_PRRP is also more stable in practice and is resistant to pathological cases on which GEPP and CALU fail.

4.3NASep 14, 2008

Implementing Communication-Optimal Parallel and Sequential QR Factorizations

James Demmel, Laura Grigori, Mark Hoemmen et al.

We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms for tall and skinny matrices lead to significant speedups in practice over some of the existing algorithms, including LAPACK and ScaLAPACK, for example up to 6.7x over ScaLAPACK. The parallel algorithm for general rectangular matrices is estimated to show significant speedups over ScaLAPACK, up to 22x over ScaLAPACK.

3.3NANov 4, 2013

Solution of the optimal assignment problem by diagonal scaling algorithms

Meisam Sharify, Stéphane Gaubert, Laura Grigori

We show that a solution of the optimal assignment problem can be obtained as the limit of the solution of an entropy maximization problem, as a deformation parameter tends to infinity. This allows us to apply entropy maximization algorithms to the optimal assignment problem. In particular, the Sinkhorn algorithm leads to a parallelizable method, which can be used as a preprocessing to handle large dense optimal assignment problems. This parallel preprocessing allows one to delete entries which do not belong to optimal permutations, leading to a reduced instance which becomes solvable with limited memory requirements.

1.2NAAug 29, 2008

Communication-optimal parallel and sequential QR and LU factorizations: theory and practice

James Demmel, Laura Grigori, Mark Hoemmen et al.

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny QR (TSQR), factors m-by-n matrices in a one-dimensional (1-D) block cyclic row layout, and is optimized for m >> n. Our second algorithm, CAQR (Communication-Avoiding QR), factors general rectangular matrices distributed in a two-dimensional block cyclic layout. It invokes TSQR for each block column factorization.

1.2NAMar 15, 2011

Generalized Filtering Decomposition

Laura Grigori, Frédéric Nataf

This paper introduces a new preconditioning technique that is suitable for matrices arising from the discretization of a system of PDEs on unstructured grids. The preconditioner satisfies a so-called filtering property, which ensures that the input matrix is identical with the preconditioner on a given filtering vector. This vector is chosen to alleviate the effect of low frequency modes on convergence and so decrease or eliminate the plateau which is often observed in the convergence of iterative methods. In particular, the paper presents a general approach that allows to ensure that the filtering condition is satisfied in a matrix decomposition. The input matrix can have an arbitrary sparse structure. Hence, it can be reordered using nested dissection, to allow a parallel computation of the preconditioner and of the iterative process.

1.2NAMar 7, 2017

URV Factorization with Random Orthogonal System Mixing

Stephen Becker, James Folberth, Laura Grigori

The unpivoted and pivoted Householder QR factorizations are ubiquitous in numerical linear algebra. A difficulty with pivoted Householder QR is the communication bottleneck introduced by pivoting. In this paper we propose using random orthogonal systems to quickly mix together the columns of a matrix before computing an unpivoted QR factorization. This method computes a URV factorization which forgoes expensive pivoted QR steps in exchange for mixing in advance, followed by a cheaper, unpivoted QR factorization. The mixing step typically reduces the variability of the column norms, and in certain experiments, allows us to compute an accurate factorization where a plain, unpivoted QR performs poorly. We experiment with linear least-squares, rank-revealing factorizations, and the QLP approximation, and conclude that our randomized URV factorization behaves comparably to a similar randomized rank-revealing URV factorization, but at a fraction of the computational cost. Our experiments provide evidence that our proposed factorization might be rank-revealing with high probability.

3.4LGMay 27, 2019

Parallel and Communication Avoiding Least Angle Regression

S. Das, J. Demmel, K. Fountoulakis et al.

We are interested in parallelizing the Least Angle Regression (LARS) algorithm for fitting linear regression models to high-dimensional data. We consider two parallel and communication avoiding versions of the basic LARS algorithm. The two algorithms have different asymptotic costs and practical performance. One offers more speedup and the other produces more accurate output. The first is bLARS, a block version of LARS algorithm, where we update b columns at each iteration. Assuming that the data are row-partitioned, bLARS reduces the number of arithmetic operations, latency, and bandwidth by a factor of b. The second is Tournament-bLARS (T-bLARS), a tournament version of LARS where processors compete by running several LARS computations in parallel to choose b new columns to be added in the solution. Assuming that the data are column-partitioned, T-bLARS reduces latency by a factor of b. Similarly to LARS, our proposed methods generate a sequence of linear models. We present extensive numerical experiments that illustrate speedups up to 4x compared to LARS without any compromise in solution quality.