58.5NAMay 6
A block Recycled GMRES method with investigations into aspects of solver performanceMichael L. Parks, Kirk M. Soodhalter, Daniel B. Szyld
We propose a block Krylov subspace version of the GCRO-DR method proposed in [Parks et al.; SISC 2005], which is an iterative method allowing for the efficient minimization of the the residual over an augmented Krylov subspace. We offer a clean derivation of our proposed method and discuss methods of selecting recycling subspaces at restart as well as implementation decisions in the context of high-performance computing. Numerical experiments are split into those demonstrating convergence properties and those demonstrating the data movement and cache efficiencies of the dominant operations of the method, measured using processor monitoring code from Intel.
NAMar 29, 2018
Error Analysis and Improving the Accuracy of Winograd Convolution for Deep Neural NetworksBarbara Barabasz, Andrew Anderson, Kirk M. Soodhalter et al.
Popular deep neural networks (DNNs) spend the majority of their execution time computing convolutions. The Winograd family of algorithms can greatly reduce the number of arithmetic operations required and is present in many DNN software frameworks. However, the performance gain is at the expense of a reduction in floating point (FP) numerical accuracy. In this paper, we analyse the worst case FP error and prove the estimation of norm and conditioning of the algorithm. We show that the bound grows exponentially with the size of the convolution, but the error bound of the \textit{modified} algorithm is smaller than the original one. We propose several methods for reducing FP error. We propose a canonical evaluation ordering based on Huffman coding that reduces summation error. We study the selection of sampling "points" experimentally and find empirically good points for the most important sizes. We identify the main factors associated with good points. In addition, we explore other methods to reduce FP error, including mixed-precision convolution, and pairwise summation across DNN channels. Using our methods we can significantly reduce FP error for a given block size, which allows larger block sizes and reduced computation.
NASep 7, 2016
A modified implementation of MINRES to monitor residual subvector norms for block systemsRoland Herzog, Kirk M. Soodhalter
Saddle-point systems, i.e., structured linear systems with symmetric matrices are considered. A modified implementation of (preconditioned) MINRES is derived which allows to monitor the norms of the subvectors individually. Compared to the implementation from the textbook of [Elman, Sylvester and Wathen, Oxford University Press, 2014], our method requires one extra vector of storage and no additional applications of the preconditioner. Numerical experiments are included.