Shengguo Li

NA
6papers
119citations
Novelty35%
AI Score36

6 Papers

MSDec 22, 2016
An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures

Shengguo Li, Francois-Henry Rouet, Jie Liu et al.

In this paper, an efficient divide-and-conquer (DC) algorithm is proposed for the symmetric tridiagonal matrices based on ScaLAPACK and the hierarchically semiseparable (HSS) matrices. HSS is an important type of rank-structured matrices.Most time of the DC algorithm is cost by computing the eigenvectors via the matrix-matrix multiplications (MMM). In our parallel hybrid DC (PHDC) algorithm, MMM is accelerated by using the HSS matrix techniques when the intermediate matrix is large. All the HSS algorithms are done via the package STRUMPACK. PHDC has been tested by using many different matrices. Compared with the DC implementation in MKL, PHDC can be faster for some matrices with few deflations when using hundreds of processes. However, the gains decrease as the number of processes increases. The comparisons of PHDC with ELPA (the Eigenvalue soLvers for Petascale Applications library) are similar. PHDC is usually slower than MKL and ELPA when using 300 or more processes on Tianhe-2 supercomputer.

NAMar 2, 2014
An improved dqds algorithm

Shengguo Li, Ming Gu, Beresford N. Parlett

In this paper we present an improved dqds algorithm for computing all the singular values of a bidiagonal matrix to high relative accuracy. There are two key contributions: a novel deflation strategy that improves the convergence for badly scaled matrices, and some modifications to certain shift strategies that accelerate the convergence for most bidiagonal matrices. These techniques together ensure linear worst case complexity of the improved algorithm (denoted by V5). Our extensive numerical experiments indicate that V5 is typically 1.2x--4x faster than DLASQ (the LAPACK-3.4.0 implementation of dqds) without any degradation in accuracy. On matrices for which DLASQ shows very slow convergence, V5 can be 3x--10x faster. At the end of this paper, a hybrid algorithm (HDLASQ) is developed by combining our improvements with the aggressive early deflation strategy (AggDef2 in [SIAM J. Matrix Anal. Appl., 33(2012), 22-51]). Numerical results show that HDLASQ is the fastest among these different versions.

78.0NAApr 1
Adaptive Polynomial Filtering for Hermitian Interior Eigenproblems: Convergence Analysis

Xiaofei Xu, Yuhui Ni, Shengguo Li et al.

Interior eigenvalue problems for large-scale sparse Hermitian matrices are fundamental in computational science. We propose an adaptive polynomial filtering strategy based on Chebyshev expansion of a step function, integrated into a filtered subspace iteration framework. We establish pointwise convergence bounds in both undamped and damped settings and incorporate an enhanced spurious eigenvalue detection technique to improve efficiency and robustness. At the implementation level, we employ MaSpMM to accelerate the polynomial filtering step. Numerical results demonstrate the efficiency and robustness of the proposed method compared with classical approaches.

OCJul 23, 2019
Heavy-ball Algorithms Always Escape Saddle Points

Tao Sun, Dongsheng Li, Zhe Quan et al.

Nonconvex optimization algorithms with random initialization have attracted increasing attention recently. It has been showed that many first-order methods always avoid saddle points with random starting points. In this paper, we answer a question: can the nonconvex heavy-ball algorithms with random initialization avoid saddle points? The answer is yes! Direct using the existing proof technique for the heavy-ball algorithms is hard due to that each iteration of the heavy-ball algorithm consists of current and last points. It is impossible to formulate the algorithms as iteration like xk+1= g(xk) under some mapping g. To this end, we design a new mapping on a new space. With some transfers, the heavy-ball algorithm can be interpreted as iterations after this mapping. Theoretically, we prove that heavy-ball gradient descent enjoys larger stepsize than the gradient descent to escape saddle points to escape the saddle point. And the heavy-ball proximal point algorithm is also considered; we also proved that the algorithm can always escape the saddle point.

NAOct 15, 2015
New fast divide-and-conquer algorithms for the symmetric tridiagonal eigenvalue problem

Shengguo Li, Xiangke Liao, Jie Liu et al.

In this paper, two accelerated divide-and-conquer algorithms are proposed for the symmetric tridiagonal eigenvalue problem, which cost $O(N^2r)$ {flops} in the worst case, where $N$ is the dimension of the matrix and $r$ is a modest number depending on the distribution of eigenvalues. Both of these algorithms use hierarchically semiseparable (HSS) matrices to approximate some intermediate eigenvector matrices which are Cauchy-like matrices and are off-diagonally low-rank. The difference of these two versions lies in using different HSS construction algorithms, one (denoted by {ADC1}) uses a structured low-rank approximation method and the other ({ADC2}) uses a randomized HSS construction algorithm. For the ADC2 algorithm, a method is proposed to estimate the off-diagonal rank. Numerous experiments have been done to show their stability and efficiency. These algorithms are implemented in parallel in a shared memory environment, and some parallel implementation details are included. Comparing the ADCs with highly optimized multithreaded libraries such as Intel MKL, we find that ADCs could be more than 6x times faster for some large matrices with few deflations.