Meiyue Shao

NA
8papers
163citations
Novelty34%
AI Score39

8 Papers

NAMay 7, 2016
Low Rank Approximation in $G_0W_0$ Approximation

Meiyue Shao, Lin Lin, Chao Yang et al.

The single particle energies obtained in a Kohn--Sham density functional theory (DFT) calculation are generally known to be poor approximations to electron excitation energies that are measured in transport, tunneling and spectroscopic experiments such as photo-emission spectroscopy. The correction to these energies can be obtained from the poles of a single particle Green's function derived from a many-body perturbation theory. From a computational perspective, the accuracy and efficiency of such an approach depends on how a self energy term that properly accounts for dynamic screening of electrons is approximated. The $G_0W_0$ approximation is a widely used technique in which the self energy is expressed as the convolution of a non-interacting Green's function ($G_0$) and a screened Coulomb interaction ($W_0$) in the frequency domain. The computational cost associated with such a convolution is high due to the high complexity of evaluating $W_0$ at multiple frequencies. In this paper, we discuss how the cost of $G_0W_0$ calculation can be reduced by constructing a low rank approximation to the frequency dependent part of $W_0$. In particular, we examine the effect of such a low rank approximation on the accuracy of the $G_0W_0$ approximation. We also discuss how the numerical convolution of $G_0$ and $W_0$ can be evaluated efficiently and accurately by using a contour deformation technique with an appropriate choice of the contour.

NAApr 24, 2018
A robust and efficient implementation of LOBPCG

Jed A. Duersch, Meiyue Shao, Chao Yang et al.

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is widely used to compute eigenvalues of large sparse symmetric matrices. The algorithm can suffer from numerical instability if it is not implemented with care. This is especially problematic when the number of eigenpairs to be computed is relatively large. In this paper we propose an improved basis selection strategy based on earlier work by Hetmaniuk and Lehoucq as well as a robust convergence criterion which is backward stable to enhance the robustness. We also suggest several algorithmic optimizations that improve performance of practical LOBPCG implementations. Numerical examples confirm that our approach consistently and significantly outperforms previous competing approaches in both stability and speed.

MSDec 23, 2016
BSEPACK User's Guide

Meiyue Shao, Chao Yang

This is the user manual for the software package BSEPACK (Bethe--Salpeter Eigenvalue Solver Package).

58.6NAMar 12
Mixed precision thin SVD algorithms based on the Gram matrix

Erin Carson, Yuxin Ma, Meiyue Shao

In this work, we present a mixed precision algorithm that leverages the Gram matrix and Jacobi methods to compute the singular value decomposition (SVD) of tall-and-skinny matrices. By constructing the Gram matrix in higher precision and coupling it with a Jacobi algorithm, our theoretical analysis and numerical experiments both indicate that the singular values computed by this mixed precision thin SVD algorithm attain high relative accuracy. In practice, our mixed precision thin SVD algorithm yields speedups of over 10x on a single CPU and about 2x on distributed memory systems when compared with traditional thin SVD methods.

18.0NAMar 23
On Two-Stage Householder Orthogonalization

Zhuang-Ao He, Meiyue Shao

Two-stage orthogonalization is essential in numerical algorithms such as Krylov subspace methods. For this task we need to orthogonalize a matrix $A$ against another matrix $V$ with orthonormal columns. A common approach is to employ the block Gram--Schmidt algorithm. However, its stability largely depends on the condition number of $[V,A]$. While performing a Householder orthogonalization on $[V,A]$ is unconditionally stable, it does not utilize the knowledge that $V$ has orthonormal columns. To address these issues, we propose a two-stage Householder orthogonalization algorithm based on the generalized Householder transformation. Instead of explicitly orthogonalizing the entire $V$, our algorithm only needs to orthogonalizes a square submatrix of $V$. Theoretical analysis and numerical experiments demonstrate that our method is also unconditionally stable.

NASep 8, 2017
Accelerating Nuclear Configuration Interaction Calculations through a Preconditioned Block Iterative Eigensolver

Meiyue Shao, Hasan Metin Aktulga, Chao Yang et al.

We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.

NASep 5, 2017
A structure preserving Lanczos algorithm for computing the optical absorption spectrum

Meiyue Shao, Felipe H. da Jornada, Lin Lin et al.

We present a new structure preserving Lanczos algorithm for approximating the optical absorption spectrum in the context of solving full Bethe--Salpeter equation without Tamm--Dancoff approximation. The new algorithm is based on a structure preserving Lanczos procedure, which exploits the special block structure of Bethe--Salpeter Hamiltonian matrices. A recently developed technique of generalized averaged Gauss quadrature is incorporated to accelerate the convergence. We also establish the connection between our structure preserving Lanczos procedure with several existing Lanczos procedures developed in different contexts. Numerical examples are presented to demonstrate the effectiveness of our Lanczos algorithm.

NASep 18, 2015
Structure Preserving Parallel Algorithms for Solving the Bethe-Salpeter Eigenvalue Problem

Meiyue Shao, Felipe H. da Jornada, Chao Yang et al.

The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue problem arising from discretized Bethe-Salpeter equation in the context of computing exciton energies and states. A computational challenge is that at least half of the eigenvalues and the associated eigenvectors are desired in practice. We establish the equivalence between Bethe-Salpeter eigenvalue problems and real Hamiltonian eigenvalue problems. Based on theoretical analysis, structure preserving algorithms for a class of Bethe-Salpeter eigenvalue problems are proposed. We also show that for this class of problems all eigenvalues obtained from the Tamm-Dancoff approximation are overestimated. In order to solve large scale problems of practical interest, we discuss parallel implementations of our algorithms targeting distributed memory systems. Several numerical examples are presented to demonstrate the efficiency and accuracy of our algorithms.