Ph. L. Toint

9papers

36citations

Novelty42%

AI Score37

Ranked #112,687 of 201,326 authors (top 56%)#25,001 in LG (top 59%)

9 Papers

OCJan 7, 2018

Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization

C. Cartis, N. I. M. Gould, Ph. L. Toint

Necessary conditions for high-order optimality in smooth nonlinear constrained optimization are explored and their inherent intricacy discussed. A two-phase minimization algorithm is proposed which can achieve approximate first-, second- and third-order criticality and its evaluation complexity is analyzed as a function of the choice (among existing methods) of an inner algorithm for solving subproblems in each of the two phases. The relation between high-order criticality and penalization techniques is finally considered, showing that standard algorithmic approaches will fail if approximate constrained high-order critical points are sought.

NASep 19, 2017

On the use of the saddle formulation in weakly-constrained 4D-VAR data assimilation

S. Gratton, S. Gürol, E. Simon et al.

This paper discusses the practical use of the saddle variational formulation for the weakly-constrained 4D-VAR method in data assimilation. It is shown that the method, in its original form, may produce erratic results or diverge because of the inherent lack of monotonicity of the produced objective function values. Convergent, variationaly coherent variants of the algorithm are then proposed whose practical performance is compared to that of other formulations. This comparison is conducted on two data assimilation instances (Burgers equation and the Quasi-Geostrophic model), using two different assumptions on parallel computing environment. Because these variants essentially retain the parallelization advantages of the original proposal, they often --- but not always --- perform best, even for moderate numbers of computing processes.

STFeb 8, 2019

Bernstein Concentration Inequalities for Tensors via Einstein Products

Z. Luo, L. Qi, Ph. L. Toint

A generalization of the Bernstein matrix concentration inequality to random tensors of general order is proposed. This generalization is based on the use of Einstein products between tensors, from which a strong link can be established between matrices and tensors, in turn allowing exploitation of existing results for the former.

OCFeb 14, 2023

Multilevel Objective-Function-Free Optimization with an Application to Neural Networks Training

S. Gratton, A. Kopanicakova, Ph. L. Toint

A class of multi-level algorithms for unconstrained nonlinear optimization is presented which does not require the evaluation of the objective function. The class contains the momentum-less AdaGrad method as a particular (single-level) instance. The choice of avoiding the evaluation of the objective function is intended to make the algorithms of the class less sensitive to noise, while the multi-level feature aims at reducing their computational cost. The evaluation complexity of these algorithms is analyzed and their behaviour in the presence of noise is then illustrated in the context of training deep neural networks for supervised learning applications.

NASep 21, 2020

Minimizing convex quadratic with variable precision conjugate gradients

S. Gratton, E. Simon, D. Titley-Peloquin et al.

We investigate the method of conjugate gradients, exploiting inaccurate matrix-vector products, for the solution of convex quadratic optimization problems. Theoretical performance bounds are derived, and the necessary quantities occurring in the theoretical bounds estimated, leading to a practical algorithm. Numerical experiments suggest that this approach has significant potential, including in the steadily more important context of multi-precision computations

LGAug 1, 2023

Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple example

Ph. L. Toint

A very simple unidimensional function with Lipschitz continuous gradient is constructed such that the ADAM algorithm with constant stepsize, started from the origin, diverges when applied to minimize this function in the absence of noise on the gradient. Divergence occurs irrespective of the choice of the method parameters.

26.0LGApr 19

A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

S. Gratton, Ph. L. Toint

A unified framework for first-order optimization algorithms fornonconvex unconstrained optimization is proposed that uses adaptivelypreconditioned gradients and includes popular methods such as full anddiagonal AdaGrad, AdaNorm, as well as adpative variants of Shampoo andMuon. This framework also allows combining heterogeneous geometriesacross different groups of variables while preserving a unifiedconvergence analysis. A fully stochastic global rate-of-convergenceanalysis is conducted for all methods in the framework, with andwithout two types of momentum, using reasonable assumptions on thevariance of the gradient oracle and without assuming boundedstochastic gradients or small enough stepsize.

NADec 9, 2018

A note on solving nonlinear optimization problems in variable precision

S. Gratton, Ph. L. Toint

This short note considers an efficient variant of the trust-region algorithm with dynamic accuracy proposed Carter (1993) and Conn, Gould and Toint (2000) as a tool for very high-performance computing, an area where it is critical to allow multi-precision computations for keeping the energy dissipation under control. Numerical experiments are presented indicating that the use of the considered method can bring substantial savings in objective function's and gradient's evaluation "energy costs" by efficiently exploiting multi-precision computations.

OCNov 9, 2018

Adaptive Regularization Algorithms with Inexact Evaluations for Nonconvex Optimization

S. Bellavia, G. Gurioli, B. Morini et al.

A regularization algorithm using inexact function values and inexact derivatives is proposed and its evaluation complexity analyzed. This algorithm is applicable to unconstrained problems and to problems with inexpensive constraints (that is constraints whose evaluation and enforcement has negligible cost) under the assumption that the derivative of highest degree is $β$-Hölder continuous. It features a very flexible adaptive mechanism for determining the inexactness which is allowed, at each iteration, when computing objective function values and derivatives. The complexity analysis covers arbitrary optimality order and arbitrary degree of available approximate derivatives. It extends results of Cartis, Gould and Toint (2018) on the evaluation complexity to the inexact case: if a $q$th order minimizer is sought using approximations to the first $p$ derivatives, it is proved that a suitable approximate minimizer within $ε$ is computed by the proposed algorithm in at most $O(ε^{-\frac{p+β}{p-q+β}})$ iterations and at most $O(|\log(ε)|ε^{-\frac{p+β}{p-q+β}})$ approximate evaluations. An algorithmic variant, although more rigid in practice, can be proved to find such an approximate minimizer in $O(|\log(ε)|+ε^{-\frac{p+β}{p-q+β}})$ evaluations.While the proposed framework remains so far conceptual for high degrees and orders, it is shown to yield simple and computationally realistic inexact methods when specialized to the unconstrained and bound-constrained first- and second-order cases. The deterministic complexity results are finally extended to the stochastic context, yielding adaptive sample-size rules for subsampling methods typical of machine learning.