NAFeb 3, 2016
Inv-ASKIT: A Parallel Fast Diret Solver for Kernel MatricesChenhan D. Yu, William B. March, Bo Xiao et al.
We present a parallel algorithm for computing the approximate factorization of an $N$-by-$N$ kernel matrix. Once this factorization has been constructed (with $N \log^2 N $ work), we can solve linear systems with this matrix with $N \log N $ work. Kernel matrices represent pairwise interactions of points in metric spaces. They appear in machine learning, approximation theory, and computational physics. Kernel matrices are typically dense (matrix multiplication scales quadratically with $N$) and ill-conditioned (solves can require 100s of Krylov iterations). Thus, fast algorithms for matrix multiplication and factorization are critical for scalability. Recently we introduced ASKIT, a new method for approximating a kernel matrix that resembles N-body methods. Here we introduce INV-ASKIT, a factorization scheme based on ASKIT. We describe the new method, derive complexity estimates, and conduct an empirical study of its accuracy and scalability. We report results on real-world datasets including "COVTYPE" ($0.5$M points in 54 dimensions), "SUSY" ($4.5$M points in 8 dimensions) and "MNIST" (2M points in 784 dimensions) using shared and distributed memory parallelism. In our largest run we approximately factorize a dense matrix of size 32M $\times$ 32M (generated from points in 64 dimensions) on 4,096 Sandy-Bridge cores. To our knowledge these results improve the state of the art by several orders of magnitude.
DCJan 9, 2017
An $N \log N$ Parallel Fast Direct Solver for Kernel MatricesChenhan D. Yu, William B. March, George Biros
Kernel matrices appear in machine learning and non-parametric statistics. Given $N$ points in $d$ dimensions and a kernel function that requires $\mathcal{O}(d)$ work to evaluate, we present an $\mathcal{O}(dN\log N)$-work algorithm for the approximate factorization of a regularized kernel matrix, a common computational bottleneck in the training phase of a learning task. With this factorization, solving a linear system with a kernel matrix can be done with $\mathcal{O}(N\log N)$ work. Our algorithm only requires kernel evaluations and does not require that the kernel matrix admits an efficient global low rank approximation. Instead our factorization only assumes low-rank properties for the off-diagonal blocks under an appropriate row and column ordering. We also present a hybrid method that, when the factorization is prohibitively expensive, combines a partial factorization with iterative methods. As a highlight, we are able to approximately factorize a dense $11M\times11M$ kernel matrix in 2 minutes on 3,072 x86 "Haswell" cores and a $4.5M\times4.5M$ matrix in 1 minute using 4,352 "Knights Landing" cores.
NAJul 1, 2017
Geometry-Oblivious FMM for Compressing Dense SPD MatricesChenhan D. Yu, James Levitt, Severin Reiz et al.
We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in $N \log N$ or even $N$ time, where $N$ is the matrix size. Compression requires $N \log N$ storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.