Stefano Massei

h-index13

10papers

117citations

Novelty41%

AI Score42

Ranked #62,086 of 194,257 authors (top 32%)#243 in NA (top 10%)

10 Papers

3.3NAMay 29, 2019

Low-rank updates and a divide-and-conquer method for linear matrix equations

Daniel Kressner, Stefano Massei, Leonardo Robol

Linear matrix equations, such as the Sylvester and Lyapunov equations, play an important role in various applications, including the stability analysis and dimensionality reduction of linear dynamical control systems and the solution of partial differential equations. In this work, we present and analyze a new algorithm, based on tensorized Krylov subspaces, for quickly updating the solution of such a matrix equation when its coefficients undergo low-rank changes. We demonstrate how our algorithm can be utilized to accelerate the Newton method for solving continuous-time algebraic Riccati equations. Our algorithm also forms the basis of a new divide-and-conquer approach for linear matrix equations with coefficients that feature hierarchical low-rank structure, such as HODLR, HSS, and banded matrices. Numerical experiments demonstrate the advantages of divide-and-conquer over existing approaches, in terms of computational time and memory consumption.

1.2NAJun 13, 2018

Quasi-Toeplitz matrix arithmetic: a MATLAB toolbox

Dario A. Bini, Stefano Massei, Leonardo Robol

A Quasi Toeplitz (QT) matrix is a semi-infinite matrix of the kind $A=T(a)+E$ where $T(a)=(a_{j-i})_{i,j\in\mathbb Z^+}$, $E=(e_{i,j})_{i,j\in\mathbb Z^+}$ is compact and the norms $\lVert a\rVert_{\mathcal W} = \sum_{i\in\mathbb Z}|a_i|$ and $\lVert E \rVert_2$ are finite. These properties allow to approximate any QT-matrix, within any given precision, by means of a finite number of parameters. QT-matrices, equipped with the norm $\lVert A \rVert_{\mathcal QT}=α\lVert a\rVert_{\mathcal{W}} \lVert E \rVert_2$, for $α= (1+\sqrt 5)/2$, are a Banach algebra with the standard arithmetic operations. We provide an algorithmic description of these operations on the finite parametrization of QT-matrices, and we develop a MATLAB toolbox implementing them in a transparent way. The toolbox is then extended to perform arithmetic operations on matrices of finite size that have a Toeplitz plus low-rank structure. This enables the development of algorithms for Toeplitz and quasi-Toeplitz matrices whose cost does not necessarily increase with the dimension of the problem. Some examples of applications to computing matrix functions and to solving matrix equations are presented, and confirm the effectiveness of the approach.

1.2NAFeb 6, 2019

On maximum volume submatrices and cross approximation for symmetric semidefinite and diagonally dominant matrices

Alice Cortinovis, Daniel Kressner, Stefano Massei

The problem of finding a $k \times k$ submatrix of maximum volume of a matrix $A$ is of interest in a variety of applications. For example, it yields a quasi-best low-rank approximation constructed from the rows and columns of $A$. We show that such a submatrix can always be chosen to be a principal submatrix if $A$ is symmetric semidefinite or diagonally dominant. Then we analyze the low-rank approximation error returned by a greedy method for volume maximization, cross approximation with complete pivoting. Our bound for general matrices extends an existing result for symmetric semidefinite matrices and yields new error estimates for diagonally dominant matrices. In particular, for doubly diagonally dominant matrices the error is shown to remain within a modest factor of the best approximation error. We also illustrate how the application of our results to cross approximation for functions leads to new and better convergence results.

1.2NADec 10, 2016

Decay bounds for the numerical quasiseparable preservation in matrix functions

Stefano Massei, Leonardo Robol

Given matrices $A$ and $B$ such that $B=f(A)$, where $f(z)$ is a holomorphic function, we analyze the relation between the singular values of the off-diagonal submatrices of $A$ and $B$. We provide family of bounds which depend on the interplay between the spectrum of the argument $A$ and the singularities of the function. In particular, these bounds guarantee the numerical preservation of quasiseparable structures under mild hypotheses. We extend the Dunford-Cauchy integral formula to the case in which some poles are contained inside the contour of integration. We use this tool together with the technology of hierarchical matrices ($\mathcal H$-matrices) for the effective computation of matrix functions with quasiseparable arguments.

7.4MLJan 23, 2023

On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

Lu Xia, Michiel E. Hochstenbach, Stefano Massei

When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-\Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding errors may be beneficial since choosing a proper rounding strategy eliminates the vanishing gradient problem and forces the rounding bias in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point number formats.

1.2NAJan 5, 2016

Efficient cyclic reduction for QBDs with rank structured blocks

Dario A. Bini, Stefano Massei, Leonardo Robol

We provide effective algorithms for solving block tridiagonal block Toeplitz systems with $m\times m$ quasiseparable blocks, as well as quadratic matrix equations with $m\times m$ quasiseparable coefficients, based on cyclic reduction and on the technology of rank-structured matrices. The algorithms rely on the exponential decay of the singular values of the off-diagonal submatrices generated by cyclic reduction. We provide a formal proof of this decay in the Markovian framework. The results of the numerical experiments that we report confirm a significant speed up over the general algorithms, already starting with the moderately small size $m\approx 10^2$.

7.5NAMar 20

Error formulas for block rational Krylov approximations of matrix functions

Stefano Massei, Leonardo Robol

This paper investigates explicit expressions for the error associated with the block rational Krylov approximation of matrix functions. Two formulas are proposed, both derived from characterizations of the block FOM residual. The first formula employs a block generalization of the residual polynomial, while the second leverages the block collinearity of the residuals. A posteriori error bounds based on the knowledge of spectral information of the argument are derived and tested on a set of examples. Notably, both error formulas and their corresponding upper bounds do not require the use of quadratures for their practical evaluation.

5.5PRMay 7

Computing the density of the Kesten-Stigum limit in supercritical Galton-Watson processes

Alice Cortinovis, Sophie Hautphenne, Stefano Massei

This paper proposes a novel numerical method for computing the density of the limit random variable associated with a supercritical Galton-Watson process. This random variable captures the effect of early demographic fluctuations and determines the random amplitude of long-term exponential population growth. While the existence of a non-trivial limit is ensured by the Kesten-Stigum theorem, computing its density in a stable and efficient manner for arbitrary offspring laws remains a significant challenge. The proposed approach leverages a functional equation that characterizes the Laplace-Stieltjes transform of the limit distribution and combines it with a moment-matching method to obtain accurate approximations within a class of linear combinations of Laguerre polynomials with exponential damping. The effectiveness of the approach is validated on several examples in which the offspring generating function is a polynomial of bounded degree.

2.3MLDec 23, 2023

AdamL: A fast adaptive gradient method incorporating loss function

Lu Xia, Stefano Massei

Adaptive first-order optimizers are fundamental tools in deep learning, although they may suffer from poor generalization due to the nonuniform gradient scaling. In this work, we propose AdamL, a novel variant of the Adam optimizer, that takes into account the loss function information to attain better generalization results. We provide sufficient conditions that together with the Polyak-Lojasiewicz inequality, ensure the linear convergence of AdamL. As a byproduct of our analysis, we prove similar convergence properties for the EAdam, and AdaBelief optimizers. Experimental results on benchmark functions show that AdamL typically achieves either the fastest convergence or the lowest objective function values when compared to Adam, EAdam, and AdaBelief. These superior performances are confirmed when considering deep learning tasks such as training convolutional neural networks, training generative adversarial networks using vanilla convolutional neural networks, and long short-term memory networks. Finally, in the case of vanilla convolutional neural networks, AdamL stands out from the other Adam's variants and does not require the manual adjustment of the learning rate during the later stage of the training.

7.8LGFeb 24, 2022

On the influence of stochastic roundoff errors and their bias on the convergence of the gradient descent method with low-precision floating-point computation

Lu Xia, Stefano Massei, Michiel E. Hochstenbach et al.

When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.