DCOct 26, 2016
The Reverse Cuthill-McKee Algorithm in Distributed-MemoryAriful Azad, Mathias Jacquelin, Aydin Buluc et al.
Ordering vertices of a graph is key to minimize fill-in and data structure size in sparse direct solvers, maximize locality in iterative solvers, and improve performance in graph algorithms. Except for naturally parallelizable ordering methods such as nested dissection, many important ordering methods have not been efficiently mapped to distributed-memory architectures. In this paper, we present the first-ever distributed-memory implementation of the reverse Cuthill-McKee (RCM) algorithm for reducing the profile of a sparse matrix. Our parallelization uses a two-dimensional sparse matrix decomposition. We achieve high performance by decomposing the problem into a small number of primitives and utilizing optimized implementations of these primitives. Our implementation shows strong scaling up to 1024 cores for smaller matrices and up to 4096 cores for larger matrices.
LGOct 16, 2021
Deep Learning and Spectral Embedding for Graph PartitioningAlice Gatti, Zhixiong Hu, Tess Smidt et al.
We present a graph bisection and partitioning algorithm based on graph neural networks. For each node in the graph, the network outputs probabilities for each of the partitions. The graph neural network consists of two modules: an embedding phase and a partitioning phase. The embedding phase is trained first by minimizing a loss function inspired by spectral graph theory. The partitioning module is trained through a loss function that corresponds to the expected value of the normalized cut. Both parts of the neural network rely on SAGE convolutional layers and graph coarsening using heavy edge matching. The multilevel structure of the neural network is inspired by the multigrid algorithm. Our approach generalizes very well to bigger graphs and has partition quality comparable to METIS, Scotch and spectral partitioning, with shorter runtime compared to METIS and spectral partitioning.
LGApr 8, 2021
Graph Partitioning and Sparse Matrix Ordering using Reinforcement Learning and Graph Neural NetworksAlice Gatti, Zhixiong Hu, Tess Smidt et al.
We present a novel method for graph partitioning, based on reinforcement learning and graph convolutional neural networks. Our approach is to recursively partition coarser representations of a given graph. The neural network is implemented using SAGE graph convolution layers, and trained using an advantage actor critic (A2C) agent. We present two variants, one for finding an edge separator that minimizes the normalized cut or quotient cut, and one that finds a small vertex separator. The vertex separators are then used to construct a nested dissection ordering to permute a sparse matrix so that its triangular factorization will incur less fill-in. The partitioning quality is compared with partitions obtained using METIS and SCOTCH, and the nested dissection ordering is evaluated in the sparse solver SuperLU. Our results show that the proposed method achieves similar partitioning quality as METIS and SCOTCH. Furthermore, the method generalizes across different classes of graphs, and works well on a variety of graphs from the SuiteSparse sparse matrix collection.
NUCL-THOct 6, 2018
Deep learning: Extrapolation tool for ab initio nuclear theoryGianina Alina Negoita, James P. Vary, Glenn R. Luecke et al.
Ab initio approaches in nuclear theory, such as the no-core shell model (NCSM), have been developed for approximately solving finite nuclei with realistic strong interactions. The NCSM and other approaches require an extrapolation of the results obtained in a finite basis space to the infinite basis space limit and assessment of the uncertainty of those extrapolations. Each observable requires a separate extrapolation and most observables have no proven extrapolation method. We propose a feed-forward artificial neural network (ANN) method as an extrapolation tool to obtain the ground state energy and the ground state point-proton root-mean-square (rms) radius along with their extrapolation uncertainties. The designed ANNs are sufficient to produce results for these two very different observables in $^6$Li from the ab initio NCSM results in small basis spaces that satisfy the following theoretical physics condition: independence of basis space parameters in the limit of extremely large matrices. Comparisons of the ANN results with other extrapolation methods are also provided.
NASep 8, 2017
Accelerating Nuclear Configuration Interaction Calculations through a Preconditioned Block Iterative EigensolverMeiyue Shao, Hasan Metin Aktulga, Chao Yang et al.
We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.