MSDSNANAJul 13, 2017

Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

arXiv:1707.0514150 citations
AI Analysis

This work accelerates hierarchical matrix operations on GPUs, benefiting applications in scientific computing and data analysis that rely on low-rank approximations.

The authors present high-performance batched QR and SVD algorithms for GPUs, achieving substantial speedups over cuSOLVER SVDs. These routines enable efficient hierarchical matrix compression on GPUs.

We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes