Stefano Massei

15papers

206citations

Novelty42%

AI Score43

Ranked #80,745 of 205,806 authors (top 39%)#467 in NA (top 14%)

15 Papers

NAMay 29, 2019

Low-rank updates and a divide-and-conquer method for linear matrix equations

Daniel Kressner, Stefano Massei, Leonardo Robol

Linear matrix equations, such as the Sylvester and Lyapunov equations, play an important role in various applications, including the stability analysis and dimensionality reduction of linear dynamical control systems and the solution of partial differential equations. In this work, we present and analyze a new algorithm, based on tensorized Krylov subspaces, for quickly updating the solution of such a matrix equation when its coefficients undergo low-rank changes. We demonstrate how our algorithm can be utilized to accelerate the Newton method for solving continuous-time algebraic Riccati equations. Our algorithm also forms the basis of a new divide-and-conquer approach for linear matrix equations with coefficients that feature hierarchical low-rank structure, such as HODLR, HSS, and banded matrices. Numerical experiments demonstrate the advantages of divide-and-conquer over existing approaches, in terms of computational time and memory consumption.

NANov 24, 2016

Semi-Infinite Quasi-Toeplitz Matrices with Applications to QBD Stochastic Processes

Dario A. Bini, Stefano Massei, Beatrice Meini

Denote by $\mathcal{W}_1$ the set of complex valued functions of the form $a(z)=\sum_{i=-\infty}^{+\infty}a_iz^i$ which are continuous on the unit circle, and such that $\sum_{i=-\infty}^{+\infty}|ia_i|<\infty$. We call CQT matrix a quasi-Toeplitz matrix $A$, associated with a continuous symbol $a(z)\in\mathcal W_1$, of the form $A=T(a)+E$, where $T(a)=(t_{i,j})_{i,j\in\mathbb{Z}^+}$ is the semi-infinite Toeplitz matrix such that $t_{i,j}=a_{j-i}$, for $i,j\in\mathbb Z^+$, and $E=(e_{i,j})_{i,j\in\mathbb{Z}^+}$ is a semi-infinite matrix such that $\sum_{i,j=1}^{+\infty}|e_{i,j}|$ is finite. We prove that the class of CQT matrices is a Banach algebra with a suitable sub-multiplicative matrix norm $\|\cdot\|$. We introduce a finite representation of CQT matrices together with algorithms which implement elementary matrix operations. An application to solving quadratic matrix equations of the kind $AX^2+BX+C=0$, encountered in the solution of Quasi-Birth and Death (QBD) stochastic processes with a denumerable set of phases, is presented where $A,B,C$ are CQT matrices.

NAJun 13, 2018

Quasi-Toeplitz matrix arithmetic: a MATLAB toolbox

Dario A. Bini, Stefano Massei, Leonardo Robol

A Quasi Toeplitz (QT) matrix is a semi-infinite matrix of the kind $A=T(a)+E$ where $T(a)=(a_{j-i})_{i,j\in\mathbb Z^+}$, $E=(e_{i,j})_{i,j\in\mathbb Z^+}$ is compact and the norms $\lVert a\rVert_{\mathcal W} = \sum_{i\in\mathbb Z}|a_i|$ and $\lVert E \rVert_2$ are finite. These properties allow to approximate any QT-matrix, within any given precision, by means of a finite number of parameters. QT-matrices, equipped with the norm $\lVert A \rVert_{\mathcal QT}=α\lVert a\rVert_{\mathcal{W}} \lVert E \rVert_2$, for $α= (1+\sqrt 5)/2$, are a Banach algebra with the standard arithmetic operations. We provide an algorithmic description of these operations on the finite parametrization of QT-matrices, and we develop a MATLAB toolbox implementing them in a transparent way. The toolbox is then extended to perform arithmetic operations on matrices of finite size that have a Toeplitz plus low-rank structure. This enables the development of algorithms for Toeplitz and quasi-Toeplitz matrices whose cost does not necessarily increase with the dimension of the problem. Some examples of applications to computing matrix functions and to solving matrix equations are presented, and confirm the effectiveness of the approach.

NAFeb 6, 2019

On maximum volume submatrices and cross approximation for symmetric semidefinite and diagonally dominant matrices

Alice Cortinovis, Daniel Kressner, Stefano Massei

The problem of finding a $k \times k$ submatrix of maximum volume of a matrix $A$ is of interest in a variety of applications. For example, it yields a quasi-best low-rank approximation constructed from the rows and columns of $A$. We show that such a submatrix can always be chosen to be a principal submatrix if $A$ is symmetric semidefinite or diagonally dominant. Then we analyze the low-rank approximation error returned by a greedy method for volume maximization, cross approximation with complete pivoting. Our bound for general matrices extends an existing result for symmetric semidefinite matrices and yields new error estimates for diagonally dominant matrices. In particular, for doubly diagonally dominant matrices the error is shown to remain within a modest factor of the best approximation error. We also illustrate how the application of our results to cross approximation for functions leads to new and better convergence results.

NANov 25, 2016

On Functions of quasi Toeplitz matrices

Dario A. Bini, Stefano Massei, Beatrice Meini

Let $a(z)=\sum_{i\in\mathbb Z}a_iz^i$ be a complex valued continuous function, defined for $|z|=1$, such that $\sum_{i=-\infty}^{+\infty}|ia_i|<\infty$. Consider the semi-infinite Toeplitz matrix $T(a)=(t_{i,j})_{i,j\in\mathbb Z^+}$ associated with the symbol $a(z)$ such that $t_{i,j}=a_{j-i}$. A quasi-Toeplitz matrix associated with the continuous symbol $a(z)$ is a matrix of the form $A=T(a)+E$ where $E=(e_{i,j})$, $\sum_{i,j\in\mathbb Z^+}|e_{i,j}|<\infty$, and is called a CQT-matrix. Given a function $f(x)$ and a CQT matrix $M$, we provide conditions under which $f(M)$ is well defined and is a CQT matrix. Moreover, we introduce a parametrization of CQT matrices and algorithms for the computation of $f(M)$. We treat the case where $f(x)$ is assigned in terms of power series and the case where $f(x)$ is defined in terms of a Cauchy integral. This analysis is applied also to finite matrices which can be written as the sum of a Toeplitz matrix and of a low rank correction.

NAAug 22, 2018

Solving rank structured Sylvester and Lyapunov equations

Stefano Massei, Davide Palitta, Leonardo Robol

We consider the problem of efficiently solving Sylvester and Lyapunov equations of medium and large scale, in case of rank-structured data, i.e., when the coefficient matrices and the right-hand side have low-rank off-diagonal blocks. This comprises problems with banded data, recently studied by Haber and Verhaegen in "Sparse solution of the Lyapunov equation for large-scale interconnected systems", Automatica, 2016, and by Palitta and Simoncini in "Numerical methods for large-scale Lyapunov equations with symmetric banded data", SISC, 2018, which often arise in the discretization of elliptic PDEs. We show that, under suitable assumptions, the quasiseparable structure is guaranteed to be numerically present in the solution, and explicit novel estimates of the numerical rank of the off-diagonal blocks are provided. Efficient solution schemes that rely on the technology of hierarchical matrices are described, and several numerical experiments confirm the applicability and efficiency of the approaches. We develop a MATLAB toolbox that allows easy replication of the experiments and a ready-to-use interface for the solvers. The performances of the different approaches are compared, and we show that the new methods described are efficient on several classes of relevant problems.

NADec 10, 2016

Decay bounds for the numerical quasiseparable preservation in matrix functions

Stefano Massei, Leonardo Robol

Given matrices $A$ and $B$ such that $B=f(A)$, where $f(z)$ is a holomorphic function, we analyze the relation between the singular values of the off-diagonal submatrices of $A$ and $B$. We provide family of bounds which depend on the interplay between the spectrum of the argument $A$ and the singularities of the function. In particular, these bounds guarantee the numerical preservation of quasiseparable structures under mild hypotheses. We extend the Dunford-Cauchy integral formula to the case in which some poles are contained inside the contour of integration. We use this tool together with the technology of hierarchical matrices ($\mathcal H$-matrices) for the effective computation of matrix functions with quasiseparable arguments.

MLJan 23, 2023

On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

Lu Xia, Michiel E. Hochstenbach, Stefano Massei

When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-\Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding errors may be beneficial since choosing a proper rounding strategy eliminates the vanishing gradient problem and forces the rounding bias in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point number formats.

NAJan 5, 2016

Efficient cyclic reduction for QBDs with rank structured blocks

Dario A. Bini, Stefano Massei, Leonardo Robol

We provide effective algorithms for solving block tridiagonal block Toeplitz systems with $m\times m$ quasiseparable blocks, as well as quadratic matrix equations with $m\times m$ quasiseparable coefficients, based on cyclic reduction and on the technology of rank-structured matrices. The algorithms rely on the exponential decay of the singular values of the off-diagonal submatrices generated by cyclic reduction. We provide a formal proof of this decay in the Markovian framework. The results of the numerical experiments that we report confirm a significant speed up over the general algorithms, already starting with the moderately small size $m\approx 10^2$.

4.8PRMay 7

Computing the density of the Kesten-Stigum limit in supercritical Galton-Watson processes

Alice Cortinovis, Sophie Hautphenne, Stefano Massei

This paper proposes a novel numerical method for computing the density of the limit random variable associated with a supercritical Galton-Watson process. This random variable captures the effect of early demographic fluctuations and determines the random amplitude of long-term exponential population growth. While the existence of a non-trivial limit is ensured by the Kesten-Stigum theorem, computing its density in a stable and efficient manner for arbitrary offspring laws remains a significant challenge. The proposed approach leverages a functional equation that characterizes the Laplace-Stieltjes transform of the limit distribution and combines it with a moment-matching method to obtain accurate approximations within a class of linear combinations of Laguerre polynomials with exponential damping. The effectiveness of the approach is validated on several examples in which the offspring generating function is a polynomial of bounded degree.

33.0NAMar 20

Error formulas for block rational Krylov approximations of matrix functions

Stefano Massei, Leonardo Robol

This paper investigates explicit expressions for the error associated with the block rational Krylov approximation of matrix functions. Two formulas are proposed, both derived from characterizations of the block FOM residual. The first formula employs a block generalization of the residual polynomial, while the second leverages the block collinearity of the residuals. A posteriori error bounds based on the knowledge of spectral information of the argument are derived and tested on a set of examples. Notably, both error formulas and their corresponding upper bounds do not require the use of quadratures for their practical evaluation.

MLDec 23, 2023

AdamL: A fast adaptive gradient method incorporating loss function

Lu Xia, Stefano Massei

Adaptive first-order optimizers are fundamental tools in deep learning, although they may suffer from poor generalization due to the nonuniform gradient scaling. In this work, we propose AdamL, a novel variant of the Adam optimizer, that takes into account the loss function information to attain better generalization results. We provide sufficient conditions that together with the Polyak-Lojasiewicz inequality, ensure the linear convergence of AdamL. As a byproduct of our analysis, we prove similar convergence properties for the EAdam, and AdaBelief optimizers. Experimental results on benchmark functions show that AdamL typically achieves either the fastest convergence or the lowest objective function values when compared to Adam, EAdam, and AdaBelief. These superior performances are confirmed when considering deep learning tasks such as training convolutional neural networks, training generative adversarial networks using vanilla convolutional neural networks, and long short-term memory networks. Finally, in the case of vanilla convolutional neural networks, AdamL stands out from the other Adam's variants and does not require the manual adjustment of the learning rate during the later stage of the training.

LGFeb 24, 2022

On the influence of stochastic roundoff errors and their bias on the convergence of the gradient descent method with low-precision floating-point computation

Lu Xia, Stefano Massei, Michiel E. Hochstenbach et al.

When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.

NAJan 24, 2020

Certified and fast computations with shallow covariance kernels

Daniel Kressner, Jonas Latz, Stefano Massei et al.

Many techniques for data science and uncertainty quantification demand efficient tools to handle Gaussian random fields, which are defined in terms of their mean functions and covariance operators. Recently, parameterized Gaussian random fields have gained increased attention, due to their higher degree of flexibility. However, especially if the random field is parameterized through its covariance operator, classical random field discretization techniques fail or become inefficient. In this work we introduce and analyze a new and certified algorithm for the low-rank approximation of a parameterized family of covariance operators which represents an extension of the adaptive cross approximation method for symmetric positive definite matrices. The algorithm relies on an affine linear expansion of the covariance operator with respect to the parameters, which needs to be computed in a preprocessing step using, e.g., the empirical interpolation method. We discuss and test our new approach for isotropic covariance kernels, such as Matérn kernels. The numerical results demonstrate the advantages of our approach in terms of computational time and confirm that the proposed algorithm provides the basis of a fast sampling procedure for parameter dependent Gaussian random fields.

NAAug 24, 2016

On the decay of the off-diagonal singular values in cyclic reduction

Dario Andrea Bini, Stefano Massei, Leonardo Robol

It was recently observed that the singular values of the off-diagonal blocks of the matrix sequences generated by the Cyclic Reduction algorithm decay exponentially. This property was used to solve, with a higher efficiency, certain quadratic matrix equations encountered in the analysis of queueing models. In this paper, we provide a sharp theoretical bound to the basis of this exponential decay together with a tool for its estimation based on a rational interpolation problem. Applications to solving $n\times n$ block tridiagonal block Toeplitz systems with $n\times n$ semiseparable blocks and certain generalized Sylvester equations in $O(n^2\log n)$ arithmetic operations are shown.