Randomized QR with Column Pivoting
For practitioners needing efficient matrix factorizations on parallel systems, this work offers a faster alternative to QRCP with minimal quality loss, though it is incremental as it combines existing randomized sampling with QR.
The paper tackles the communication bottleneck in QR factorization with column pivoting (QRCP) by using randomized sampling to approximate column-norm updates, achieving performance near unpivoted QR while maintaining comparable factorization quality. The method also extends to truncated QR and SVD, reducing approximation time by nearly half for small truncation ranks.
The dominant contribution to communication complexity in factorizing a matrix using QR with column pivoting is due to column-norm updates that are required to process pivot decisions. We use randomized sampling to approximate this process which dramatically reduces communication in column selection. We also introduce a sample update formula to reduce the cost of sampling trailing matrices. Using our column selection mechanism we observe results that are comparable in quality to those obtained from the QRCP algorithm, but with performance near unpivoted QR. We also demonstrate strong parallel scalability on shared memory multiple core systems using an implementation in Fortran with OpenMP. This work immediately extends to produce low-rank truncated approximations of large matrices. We propose a truncated QR factorization with column pivoting that avoids trailing matrix updates which are used in current implementations of level-3 BLAS QR and QRCP. Provided the truncation rank is small, avoiding trailing matrix updates reduces approximation time by nearly half. By using these techniques and employing a variation on Stewart's QLP algorithm, we develop an approximate truncated SVD that runs nearly as fast as truncated QR.