First-order and second-order variants of the gradient descent in a unified framework
This work provides a theoretical unification for machine learning practitioners, but it is incremental as it synthesizes existing methods without introducing new algorithms.
The paper tackles the problem of unifying various gradient descent variants by proposing a general framework that interprets six methods, including vanilla gradient descent and Newton's method, as instances of the same approach, and explains their specificities and conditions for coincidence.
In this paper, we provide an overview of first-order and second-order variants of the gradient descent method that are commonly used in machine learning. We propose a general framework in which 6 of these variants can be interpreted as different instances of the same approach. They are the vanilla gradient descent, the classical and generalized Gauss-Newton methods, the natural gradient descent method, the gradient covariance matrix approach, and Newton's method. Besides interpreting these methods within a single framework, we explain their specificities and show under which conditions some of them coincide.