OCJan 7, 2018
Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimizationC. Cartis, N. I. M. Gould, Ph. L. Toint
Necessary conditions for high-order optimality in smooth nonlinear constrained optimization are explored and their inherent intricacy discussed. A two-phase minimization algorithm is proposed which can achieve approximate first-, second- and third-order criticality and its evaluation complexity is analyzed as a function of the choice (among existing methods) of an inner algorithm for solving subproblems in each of the two phases. The relation between high-order criticality and penalization techniques is finally considered, showing that standard algorithmic approaches will fail if approximate constrained high-order critical points are sought.
NASep 19, 2017
On the use of the saddle formulation in weakly-constrained 4D-VAR data assimilationS. Gratton, S. Gürol, E. Simon et al.
This paper discusses the practical use of the saddle variational formulation for the weakly-constrained 4D-VAR method in data assimilation. It is shown that the method, in its original form, may produce erratic results or diverge because of the inherent lack of monotonicity of the produced objective function values. Convergent, variationaly coherent variants of the algorithm are then proposed whose practical performance is compared to that of other formulations. This comparison is conducted on two data assimilation instances (Burgers equation and the Quasi-Geostrophic model), using two different assumptions on parallel computing environment. Because these variants essentially retain the parallelization advantages of the original proposal, they often --- but not always --- perform best, even for moderate numbers of computing processes.
STFeb 8, 2019
Bernstein Concentration Inequalities for Tensors via Einstein ProductsZ. Luo, L. Qi, Ph. L. Toint
A generalization of the Bernstein matrix concentration inequality to random tensors of general order is proposed. This generalization is based on the use of Einstein products between tensors, from which a strong link can be established between matrices and tensors, in turn allowing exploitation of existing results for the former.
OCFeb 14, 2023
Multilevel Objective-Function-Free Optimization with an Application to Neural Networks TrainingS. Gratton, A. Kopanicakova, Ph. L. Toint
A class of multi-level algorithms for unconstrained nonlinear optimization is presented which does not require the evaluation of the objective function. The class contains the momentum-less AdaGrad method as a particular (single-level) instance. The choice of avoiding the evaluation of the objective function is intended to make the algorithms of the class less sensitive to noise, while the multi-level feature aims at reducing their computational cost. The evaluation complexity of these algorithms is analyzed and their behaviour in the presence of noise is then illustrated in the context of training deep neural networks for supervised learning applications.
NASep 21, 2020
Minimizing convex quadratic with variable precision conjugate gradientsS. Gratton, E. Simon, D. Titley-Peloquin et al.
We investigate the method of conjugate gradients, exploiting inaccurate matrix-vector products, for the solution of convex quadratic optimization problems. Theoretical performance bounds are derived, and the necessary quantities occurring in the theoretical bounds estimated, leading to a practical algorithm. Numerical experiments suggest that this approach has significant potential, including in the steadily more important context of multi-precision computations
LGAug 1, 2023
Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple examplePh. L. Toint
A very simple unidimensional function with Lipschitz continuous gradient is constructed such that the ADAM algorithm with constant stepsize, started from the origin, diverges when applied to minimize this function in the absence of noise on the gradient. Divergence occurs irrespective of the choice of the method parameters.
26.0LGApr 19
A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and MuoS. Gratton, Ph. L. Toint
A unified framework for first-order optimization algorithms fornonconvex unconstrained optimization is proposed that uses adaptivelypreconditioned gradients and includes popular methods such as full anddiagonal AdaGrad, AdaNorm, as well as adpative variants of Shampoo andMuon. This framework also allows combining heterogeneous geometriesacross different groups of variables while preserving a unifiedconvergence analysis. A fully stochastic global rate-of-convergenceanalysis is conducted for all methods in the framework, with andwithout two types of momentum, using reasonable assumptions on thevariance of the gradient oracle and without assuming boundedstochastic gradients or small enough stepsize.
NADec 9, 2018
A note on solving nonlinear optimization problems in variable precisionS. Gratton, Ph. L. Toint
This short note considers an efficient variant of the trust-region algorithm with dynamic accuracy proposed Carter (1993) and Conn, Gould and Toint (2000) as a tool for very high-performance computing, an area where it is critical to allow multi-precision computations for keeping the energy dissipation under control. Numerical experiments are presented indicating that the use of the considered method can bring substantial savings in objective function's and gradient's evaluation "energy costs" by efficiently exploiting multi-precision computations.
OCNov 9, 2018
Adaptive Regularization Algorithms with Inexact Evaluations for Nonconvex OptimizationS. Bellavia, G. Gurioli, B. Morini et al.
A regularization algorithm using inexact function values and inexact derivatives is proposed and its evaluation complexity analyzed. This algorithm is applicable to unconstrained problems and to problems with inexpensive constraints (that is constraints whose evaluation and enforcement has negligible cost) under the assumption that the derivative of highest degree is $β$-Hölder continuous. It features a very flexible adaptive mechanism for determining the inexactness which is allowed, at each iteration, when computing objective function values and derivatives. The complexity analysis covers arbitrary optimality order and arbitrary degree of available approximate derivatives. It extends results of Cartis, Gould and Toint (2018) on the evaluation complexity to the inexact case: if a $q$th order minimizer is sought using approximations to the first $p$ derivatives, it is proved that a suitable approximate minimizer within $ε$ is computed by the proposed algorithm in at most $O(ε^{-\frac{p+β}{p-q+β}})$ iterations and at most $O(|\log(ε)|ε^{-\frac{p+β}{p-q+β}})$ approximate evaluations. An algorithmic variant, although more rigid in practice, can be proved to find such an approximate minimizer in $O(|\log(ε)|+ε^{-\frac{p+β}{p-q+β}})$ evaluations.While the proposed framework remains so far conceptual for high degrees and orders, it is shown to yield simple and computationally realistic inexact methods when specialized to the unconstrained and bound-constrained first- and second-order cases. The deterministic complexity results are finally extended to the stochastic context, yielding adaptive sample-size rules for subsampling methods typical of machine learning.