DCMar 31

A Practical GPU-Accelerated Implementation of Orthogonal Matching Pursuit

arXiv:2407.064341.92 citationsh-index: 46
AI Analysis

This provides a practical speed improvement for sparse signal recovery tasks, though it is incremental as it optimizes an existing method.

The paper tackled the problem of inefficient Orthogonal Matching Pursuit (OMP) implementations by developing a GPU-accelerated version that leverages Cholesky inverse properties, achieving up to 310x speedup over Scikit-Learn and 26x over SPAMS.

Finding the sparsest solution to the underdetermined system $\mathbf{y}=\mathbf{Ax}$, given a tolerance, is known to be NP-hard. Many approximate solutions to this problem exist, and Orthogonal Matching Pursuit (OMP) is one of the most widely used. However, existing OMP implementations don't take full advantage of matrix properties or modern CPU and GPU-based Linear Algebra kernels. For this paper, we implemented an efficient implementation of OMP that leverages Cholesky inverse properties as well as the power of GPUs to deliver up to \textbf{310x speedup over Scikit-Learn} and \textbf{26x over SPAMS}. The package is published on PyPI (\texttt{pip install batched-omp}) and is fully scikit-learn compatible.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes