NESPDSCOMP-PHJun 3, 2020

Optimizing Neural Networks via Koopman Operator Theory

arXiv:2006.02361v361 citations
Originality Highly original
AI Analysis

This addresses the slow optimization problem in deep learning, particularly for non-convex deep networks, though it is an incremental step with limitations in applicability.

The paper tackles the problem of accelerating neural network training by applying Koopman operator theory to predict network weights and biases, achieving >10x faster training than gradient descent methods like Adam during a specific time window.

Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection. As Koopman operator theory is a linear theory, a successful implementation of it in evolving network weights and biases offers the promise of accelerated training, especially in the context of deep networks, where optimization is inherently a non-convex problem. We show that Koopman operator theoretic methods allow for accurate predictions of weights and biases of feedforward, fully connected deep networks over a non-trivial range of training time. During this window, we find that our approach is >10x faster than various gradient descent based methods (e.g. Adam, Adadelta, Adagrad), in line with our complexity analysis. We end by highlighting open questions in this exciting intersection between dynamical systems and neural network theory. We highlight additional methods by which our results could be expanded to broader classes of networks and larger training intervals, which shall be the focus of future work.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes