PRMLSep 2, 2017

A convergence analysis of the perturbed compositional gradient flow: averaging principle and normal deviations

arXiv:1709.00515v310 citations
AI Analysis

This provides theoretical justification for using the SCGD algorithm in optimization problems involving compositions of functions, though it is incremental as it builds on existing methods.

The authors analyzed the perturbed compositional gradient flow, showing that its slow motion converges to an averaged ordinary differential equation and its deviation converges to a Gaussian process, which validates that the Stochastic Composite Gradient Descent algorithm has the same asymptotic convergence time as classical stochastic gradient descent in strongly convex cases.

We consider in this work a system of two stochastic differential equations named the perturbed compositional gradient flow. By introducing a separation of fast and slow scales of the two equations, we show that the limit of the slow motion is given by an averaged ordinary differential equation. We then demonstrate that the deviation of the slow motion from the averaged equation, after proper rescaling, converges to a stochastic process with Gaussian inputs. This indicates that the slow motion can be approximated in the weak sense by a standard perturbed gradient flow or the continuous-time stochastic gradient descent algorithm that solves the optimization problem for a composition of two functions. As an application, the perturbed compositional gradient flow corresponds to the diffusion limit of the Stochastic Composite Gradient Descent (SCGD) algorithm for minimizing a composition of two expected-value functions in the optimization literatures. For the strongly convex case, such an analysis implies that the SCGD algorithm has the same convergence time asymptotic as the classical stochastic gradient descent algorithm. Thus it validates, at the level of continuous approximation, the effectiveness of using the SCGD algorithm in the strongly convex case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes