4 Papers

NASep 4, 2012
An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

Nicolas Bock, Matt Challacombe

We present an optimized single-precision implementation of the Sparse Approximate Matrix Multiply (\SpAMM{}) [M. Challacombe and N. Bock, arXiv {\bf 1011.3534} (2010)], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an $\mathcal{O} (n \log n)$ computational complexity with respect to matrix dimension $n$. We find that the max norm of the error achieved with a \SpAMM{} tolerance below $2 \times 10^{-8}$ is lower than that of the single-precision {\tt SGEMM} for dense quantum chemical matrices, while outperforming {\tt SGEMM} with a cross-over already for small matrices ($n \sim 1000$). Relative to naive implementations of \SpAMM{} using Intel's Math Kernel Library ({\tt MKL}) or AMD's Core Math Library ({\tt ACML}), our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2--3x speedups.

DSNov 15, 2010
Fast Multiplication of Matrices with Decay

Matt Challacombe, Nicolas Bock

A fast algorithm for the approximate multiplication of matrices with decay is introduced; the Sparse Approximate Matrix Multiply (SpAMM) reduces complexity in the product space, a different approach from current methods that economize within the matrix space through truncation or rank reduction. Matrix truncation (element dropping) is compared to SpAMM for quantum chemical matrices with approximate exponential and algebraic decay. For matched errors in the electronic total energy, SpAMM is found to require fewer to far fewer floating point operations relative to dropping. The challenges and opportunities afforded by this new approach are discussed, including the potential for high performance implementations.

NAOct 20, 2015
Solvers for $\mathcal{O} (N)$ Electronic Structure in the Strong Scaling Limit

Nicolas Bock, Matt Challacombe, Laxmikant V. Kalé

We present a hybrid OpenMP/Charm++ framework for solving the $\mathcal{O} (N)$ Self-Consistent-Field eigenvalue problem with parallelism in the strong scaling regime, $P\gg{N}$, where $P$ is the number of cores, and $N$ a measure of system size, i.e. the number of matrix rows/columns, basis functions, atoms, molecules, etc. This result is achieved with a nested approach to Spectral Projection and the Sparse Approximate Matrix Multiply [Bock and Challacombe, SIAM J.~Sci.~Comput. 35 C72, 2013], and involves a recursive, task-parallel algorithm, often employed by generalized $N$-Body solvers, to occlusion and culling of negligible products in the case of matrices with decay. Employing classic technologies associated with generalized $N$-Body solvers, including over-decomposition, recursive task parallelism, orderings that preserve locality, and persistence-based load balancing, we obtain scaling beyond hundreds of cores per molecule for small water clusters ([H${}_2$O]${}_N$, $N \in \{ 30, 90, 150 \}$, $P/N \approx \{ 819, 273, 164 \}$) and find support for an increasingly strong scalability with increasing system size $N$.

NAOct 14, 2015
A $N$-Body Solver for Square Root Iteration

Matt Challacombe, Terry Haut, Nicolas Bock

We develop the Sparse Approximate Matrix Multiply ($\tt SpAMM$) $n$-body solver for first order Newton Schulz iteration of the matrix square root and inverse square root. The solver performs recursive two-sided metric queries on a modified Cauchy-Schwarz criterion, culling negligible sub-volumes of the product-tensor for problems with structured decay in the sub-space metric. These sub-structures are shown to bound the relative error in the matrix-matrix product, and in favorable cases, to enjoy a reduced computational complexity governed by dimensionality reduction of the product volume. A main contribution is demonstration of a new, algebraic locality that develops under contractive identity iteration, with collapse of the metric-subspace onto the identity's plane diagonal, resulting in a stronger $\tt SpAMM$ bound. Also, we carry out a first order {Fréchet} analyses for single and dual channel instances of the square root iteration, and look at bifurcations due to ill-conditioning and a too aggressive $\tt SpAMM$ approximation. Then, we show that extreme $\tt SpAMM$ approximation and contractive identity iteration can be achieved for ill-conditioned systems through regularization, and we demonstrate the potential for acceleration with a scoping, product representation of the inverse factor.