ML LGSep 17, 2020

A Principle of Least Action for the Training of Neural Networks

Skander Karkar, Ibrahim Ayed, Emmanuel de Bézenac, Patrick Gallinari

arXiv:2009.08372v49.010 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the theoretical gap in understanding neural network generalization for researchers, offering a new paradigm that could improve model control and performance in data-scarce scenarios.

The paper tackles the problem of explaining neural network generalization by viewing them as dynamical systems and identifying a low kinetic energy displacement bias linked to performance, leading to a new learning formulation based on Optimal Transport theory. The result is a novel algorithm that adapts to task complexity and achieves high generalization, especially in low data regimes.

Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behavior, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternate perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network's behavior through its displacements, we show the presence of a low kinetic energy displacement bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: finding neural networks which solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the given task, and leads to networks with a high generalization ability even in low data regimes.

View on arXiv PDF Code

Similar