Marco Prato

10papers

403citations

Novelty48%

AI Score27

Ranked #161,861 of 201,326 authors (top 80%)#1,465 in NA (top 255%)

10 Papers

LGMar 2, 2022Code

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform

Carmelo Scribano, Giorgia Franchini, Marco Prato et al.

Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of "fully-attentive" architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as $O(n^2)$ where $n$ stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time. This makes it particularly suitable in real-time contexts on embedded platforms. Moreover, we assume that the results of our research might serve as a starting point for a broader family of deep neural models with reduced memory footprint. The implementation will be made publicly available at https://github.com/cscribano/DCT-Former-Public

NAApr 8, 2017

On the convergence of a linesearch based proximal-gradient method for nonconvex optimization

Silvia Bonettini, Ignace Loris, Federica Porta et al.

We consider a variable metric linesearch based proximal gradient method for the minimization of the sum of a smooth, possibly nonconvex function plus a convex, possibly nonsmooth term. We prove convergence of this iterative algorithm to a critical point if the objective function satisfies the Kurdyka-Lojasiewicz property at each point of its domain, under the assumption that a limit point exists. The proposed method is applied to a wide collection of image processing problems and our numerical tests show that our algorithm results to be flexible, robust and competitive when compared to recently proposed approaches able to address the optimization problems arising in the considered applications.

NAFeb 26, 2015

New convergence results for the scaled gradient projection method

Silvia Bonettini, Marco Prato

The aim of this paper is to deepen the convergence analysis of the scaled gradient projection (SGP) method, proposed by Bonettini et al. in a recent paper for constrained smooth optimization. The main feature of SGP is the presence of a variable scaling matrix multiplying the gradient, which may change at each iteration. In the last few years, an extensive numerical experimentation showed that SGP equipped with a suitable choice of the scaling matrix is a very effective tool for solving large scale variational problems arising in image and signal processing. In spite of the very reliable numerical results observed, only a weak, though very general, convergence theorem is provided, establishing that any limit point of the sequence generated by SGP is stationary. Here, under the only assumption that the objective function is convex and that a solution exists, we prove that the sequence generated by SGP converges to a minimum point, if the scaling matrices sequence satisfies a simple and implementable condition. Moreover, assuming that the gradient of the objective function is Lipschitz continuous, we are also able to prove the O(1/k) convergence rate with respect to the objective function values. Finally, we present the results of a numerical experience on some relevant image restoration problems, showing that the proposed scaling matrix selection rule performs well also from the computational point of view.

NAJan 19, 2015

A new steplength selection for scaled gradient methods with application to image deblurring

Federica Porta, Marco Prato, Luca Zanni

Gradient methods are frequently used in large scale image deblurring problems since they avoid the onerous computation of the Hessian matrix of the objective function. Second order information is typically sought by a clever choice of the steplength parameter defining the descent direction, as in the case of the well-known Barzilai and Borwein rules. In a recent paper, a strategy for the steplength selection approximating the inverse of some eigenvalues of the Hessian matrix has been proposed for gradient methods applied to unconstrained minimization problems. In the quadratic case, this approach is based on a Lanczos process applied every m iterations to the matrix of the most recent m back gradients but the idea can be extended to a general objective function. In this paper we extend this rule to the case of scaled gradient projection methods applied to non-negatively constrained minimization problems, and we test the effectiveness of the proposed strategy in image deblurring problems in both the presence and the absence of an explicit edge-preserving regularization term.

NAFeb 24, 2015

A cyclic block coordinate descent method with generalized gradient projections

Silvia Bonettini, Marco Prato, Simone Rebegoldi

The aim of this paper is to present the convergence analysis of a very general class of gradient projection methods for smooth, constrained, possibly nonconvex, optimization. The key features of these methods are the Armijo linesearch along a suitable descent direction and the non Euclidean metric employed to compute the gradient projection. We develop a very general framework from the point of view of block--coordinate descent methods, which are useful when the constraints are separable.

IVOct 18, 2022

Explainable bilevel optimization: an application to the Helsinki deblur challenge

Silvia Bonettini, Giorgia Franchini, Danilo Pezzi et al.

In this paper we present a bilevel optimization scheme for the solution of a general image deblurring problem, in which a parametric variational-like approach is encapsulated within a machine learning scheme to provide a high quality reconstructed image with automatically learned parameters. The ingredients of the variational lower level and the machine learning upper one are specifically chosen for the Helsinki Deblur Challenge 2021, in which sequences of letters are asked to be recovered from out-of-focus photographs with increasing levels of blur. Our proposed procedure for the reconstructed image consists in a fixed number of FISTA iterations applied to the minimization of an edge preserving and binarization enforcing regularized least-squares functional. The parameters defining the variational model and the optimization steps, which, unlike most deep learning approaches, all have a precise and interpretable meaning, are learned via either a similarity index or a support vector machine strategy. Numerical experiments on the test images provided by the challenge authors show significant gains with respect to a standard variational approach and performances comparable with those of some of the proposed deep learning based algorithms which require the optimization of millions of parameters.

OCJun 2, 2020

Deep neural networks for inverse problems with pseudodifferential operators: an application to limited-angle tomography

Tatiana A. Bubba, Mathilde Galinier, Matti Lassas et al.

We propose a novel convolutional neural network (CNN), called $Ψ$DONet, designed for learning pseudodifferential operators ($Ψ$DOs) in the context of linear inverse problems. Our starting point is the Iterative Soft Thresholding Algorithm (ISTA), a well-known algorithm to solve sparsity-promoting minimization problems. We show that, under rather general assumptions on the forward operator, the unfolded iterations of ISTA can be interpreted as the successive layers of a CNN, which in turn provides fairly general network architectures that, for a specific choice of the parameters involved, allow to reproduce ISTA, or a perturbation of ISTA for which we can bound the coefficients of the filters. Our case study is the limited-angle X-ray transform and its application to limited-angle computed tomography (LA-CT). In particular, we prove that, in the case of LA-CT, the operations of upscaling, downscaling and convolution, which characterize our $Ψ$DONet and most deep learning schemes, can be exactly determined by combining the convolutional nature of the limited angle X-ray transform and basic properties defining an orthogonal wavelet system. We test two different implementations of $Ψ$DONet on simulated data from limited-angle geometry, generated from the ellipse data set. Both implementations provide equally good and noteworthy preliminary results, showing the potential of the approach we propose and paving the way to applying the same idea to other convolutional operators which are $Ψ$DOs or Fourier integral operators.

OCDec 11, 2018

Deep Unfolding of a Proximal Interior Point Method for Image Restoration

Carla Bertocchi, Emilie Chouzenoux, Marie-Caroline Corbineau et al.

Variational methods are widely applied to ill-posed inverse problems for they have the ability to embed prior knowledge about the solution. However, the level of performance of these methods significantly depends on a set of parameters, which can be estimated through computationally expensive and time-consuming methods. In contrast, deep learning offers very generic and efficient architectures, at the expense of explainability, since it is often used as a black-box, without any fine control over its output. Deep unfolding provides a convenient approach to combine variational-based and deep learning approaches. Starting from a variational formulation for image restoration, we develop iRestNet, a neural network architecture obtained by unfolding a proximal interior point algorithm. Hard constraints, encoding desirable properties for the restored image, are incorporated into the network thanks to a logarithmic barrier, while the barrier parameter, the stepsize, and the penalization weight are learned by the network. We derive explicit expressions for the gradient of the proximity operator for various choices of constraints, which allows training iRestNet with gradient descent and backpropagation. In addition, we provide theoretical results regarding the stability of the network for a common inverse problem example. Numerical experiments on image deblurring problems show that the proposed approach compares favorably with both state-of-the-art variational and machine learning methods in terms of image quality.

NAJun 1, 2015

Variable metric inexact line-search based methods for nonsmooth optimization

Silvia Bonettini, Ignace Loris, Federica Porta et al.

We develop a new proximal-gradient method for minimizing the sum of a differentiable, possibly nonconvex, function plus a convex, possibly non differentiable, function. The key features of the proposed method are the definition of a suitable descent direction, based on the proximal operator associated to the convex part of the objective function, and an Armijo-like rule to determine the step size along this direction ensuring the sufficient decrease of the objective function. In this frame, we especially address the possibility of adopting a metric which may change at each iteration and an inexact computation of the proximal point defining the descent direction. For the more general nonconvex case, we prove that all limit points of the iterates sequence are stationary, while for convex objective functions we prove the convergence of the whole sequence to a minimizer, under the assumption that a minimizer exists. In the latter case, assuming also that the gradient of the smooth part of the objective function is Lipschitz, we also give a convergence rate estimate, showing the O(1/k) complexity with respect to the function values. We also discuss verifiable sufficient conditions for the inexact proximal point and we present the results of a numerical experience on a convex total variation based image restoration problem, showing that the proposed approach is competitive with another state-of-the-art method.

NAJun 25, 2014

A scaled gradient projection method for Bayesian learning in dynamical systems

Silvia Bonettini, Alessandro Chiuso, Marco Prato

A crucial task in system identification problems is the selection of the most appropriate model class, and is classically addressed resorting to cross-validation or using asymptotic arguments. As recently suggested in the literature, this can be addressed in a Bayesian framework, where model complexity is regulated by few hyperparameters, which can be estimated via marginal likelihood maximization. It is thus of primary importance to design effective optimization methods to solve the corresponding optimization problem. If the unknown impulse response is modeled as a Gaussian process with a suitable kernel, the maximization of the marginal likelihood leads to a challenging nonconvex optimization problem, which requires a stable and effective solution strategy. In this paper we address this problem by means of a scaled gradient projection algorithm, in which the scaling matrix and the steplength parameter play a crucial role to provide a meaning solution in a computational time comparable with second order methods. In particular, we propose both a generalization of the split gradient approach to design the scaling matrix in the presence of box constraints, and an effective implementation of the gradient and objective function. The extensive numerical experiments carried out on several test problems show that our method is very effective in providing in few tenths of a second solutions of the problems with accuracy comparable with state-of-the-art approaches. Moreover, the flexibility of the proposed strategy makes it easily adaptable to a wider range of problems arising in different areas of machine learning, signal processing and system identification.