OCLGNAApr 6, 2021

A Caputo fractional derivative-based algorithm for optimization

arXiv:2104.02259v113 citations
Originality Incremental advance
AI Analysis

This is an incremental improvement for optimization in machine learning, offering a new method to potentially speed up gradient-based algorithms.

The authors tackled optimization problems by proposing a Caputo fractional gradient descent (CFGD) algorithm, which they proved converges to regularized solutions for quadratic functions and computationally showed that an adaptive version accelerates over gradient descent by mitigating condition number dependence.

We propose a novel Caputo fractional derivative-based optimization algorithm. Upon defining the Caputo fractional gradient with respect to the Cartesian coordinate, we present a generic Caputo fractional gradient descent (CFGD) method. We prove that the CFGD yields the steepest descent direction of a locally smoothed objective function. The generic CFGD requires three parameters to be specified, and a choice of the parameters yields a version of CFGD. We propose three versions -- non-adaptive, adaptive terminal and adaptive order. By focusing on quadratic objective functions, we provide a convergence analysis. We prove that the non-adaptive CFGD converges to a Tikhonov regularized solution. For the two adaptive versions, we derive error bounds, which show convergence to integer-order stationary point under some conditions. We derive an explicit formula of CFGD for quadratic functions. We computationally found that the adaptive terminal (AT) CFGD mitigates the dependence on the condition number in the rate of convergence and results in significant acceleration over gradient descent (GD). For non-quadratic functions, we develop an efficient implementation of CFGD using the Gauss-Jacobi quadrature, whose computational cost is approximately proportional to the number of the quadrature points and the cost of GD. Our numerical examples show that AT-CFGD results in acceleration over GD, even when a small number of the Gauss-Jacobi quadrature points (including a single point) is used.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes