LGMLOct 26, 2017

Maximum Principle Based Algorithms for Deep Learning

arXiv:1710.09513v4254 citations
Originality Incremental advance
AI Analysis

This work addresses training inefficiencies in deep learning, such as slow convergence on flat landscapes, offering a novel control-theoretic approach that is incremental but opens new avenues for improvement.

The paper tackles the problem of training deep learning models by recasting it as a control problem using Pontryagin's maximum principle, resulting in an alternative algorithm that avoids pitfalls like slow convergence near saddle points and shows favorable initial convergence rates per-iteration, provided Hamiltonian maximization is efficient.

The continuous dynamical system approach to deep learning is explored in order to devise alternative frameworks for training algorithms. Training is recast as a control problem and this allows us to formulate necessary optimality conditions in continuous time using the Pontryagin's maximum principle (PMP). A modification of the method of successive approximations is then used to solve the PMP, giving rise to an alternative training algorithm for deep learning. This approach has the advantage that rigorous error estimates and convergence results can be established. We also show that it may avoid some pitfalls of gradient-based methods, such as slow convergence on flat landscapes near saddle points. Furthermore, we demonstrate that it obtains favorable initial convergence rate per-iteration, provided Hamiltonian maximization can be efficiently carried out - a step which is still in need of improvement. Overall, the approach opens up new avenues to attack problems associated with deep learning, such as trapping in slow manifolds and inapplicability of gradient-based methods for discrete trainable variables.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes