COMP-PHJun 30, 2016
Parameterless stopping criteria for recursive density matrix expansionsAnastasia Kruchinina, Elias Rudberg, Emanuel H. Rubensson
Parameterless stopping criteria for recursive polynomial expansions to construct the density matrix in electronic structure calculations are proposed. Based on convergence order estimation the new stopping criteria automatically and accurately detect when the calculation is dominated by numerical errors and continued iteration does not improve the result. Difficulties in selecting a stopping tolerance and appropriately balancing it in relation to parameters controlling the numerical accuracy are avoided. Thus, our parameterless stopping criteria stands in contrast to the standard approach to stop as soon as some error measure goes below a user-defined parameter or tolerance. We demonstrate that the stopping criteria work well both in dense and sparse matrix calculations and in large-scale self-consistent field calculations with the quantum chemistry program Ergo (www.ergoscf.org).
NAApr 10, 2019
Localized inverse factorizationEmanuel H. Rubensson, Anton G. Artemov, Anastasia Kruchinina et al.
We propose a localized divide and conquer algorithm for inverse factorization $S^{-1} = ZZ^*$ of Hermitian positive definite matrices $S$ with localized structure, e.g. exponential decay with respect to some given distance function on the index set of $S$. The algorithm is a reformulation of recursive inverse factorization [J. Chem. Phys., 128 (2008), 104105] but makes use of localized operations only. At each level of recursion, the problem is cut into two subproblems and their solutions are combined using iterative refinement [Phys. Rev. B, 70 (2004), 193102] to give a solution to the original problem. The two subproblems can be solved in parallel without any communication and, using the localized formulation, the cost of combining their results is proportional to the cut size, defined by the binary partition of the index set. This means that for cut sizes increasing as $o(n)$ with system size $n$ the cost of combining the two subproblems is negligible compared to the overall cost for sufficiently large systems. We also present an alternative derivation of iterative refinement based on a sign matrix formulation, analyze the stability, and propose a parameterless stopping criterion. We present bounds for the initial factorization error and the number of iterations in terms of the condition number of $S$ when the starting guess is given by the solution of the two subproblems in the binary recursion. These bounds are used in theoretical results for the decay properties of the involved matrices. The localization properties of our algorithms are demonstrated for matrices corresponding to nearest neighbor overlap on one-, two-, and three-dimensional lattices as well as basis set overlap matrices generated using the Hartree-Fock and Kohn-Sham density functional theory electronic structure program Ergo [SoftwareX, 7 (2018), 107].
DCOct 28, 2012
Chunks and Tasks: a programming model for parallelization of dynamic algorithmsEmanuel H. Rubensson, Elias Rudberg
We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and Tasks library maps the chunks and tasks to physical resources. In this way we seek to combine user friendliness with high performance. An application programmer can express a parallel algorithm using a few simple building blocks, defining data and work objects and their relationships. No explicit communication calls are needed; the distribution of both work and data is handled by the Chunks and Tasks library. This makes efficient implementation of complex applications that require dynamic distribution of work and data easier. At the same time, Chunks and Tasks imposes restrictions on data access and task dependencies that facilitates the development of high performance parallel back ends. We discuss the fundamental abstractions underlying the programming model, as well as performance and fault resilience considerations. We also present a pilot C++ library implementation for clusters of multicore machines and demonstrate its performance for sparse blocked matrix-matrix multiplication.