LG NE NA MLFeb 9, 2020

On the distance between two neural networks and the stability of learning

Jeremy Bernstein, Arash Vahdat, Yisong Yue, Ming-Yu Liu

arXiv:2002.03432v321.274 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the challenge of training deeper and more complex neural networks more efficiently, potentially benefiting machine learning practitioners by reducing hyperparameter tuning efforts.

The paper tackles the problem of relating parameter distance to gradient breakdown in nonlinear compositional functions, resulting in a new distance function called deep relative trust and a descent lemma for neural networks, which may simplify training by requiring little to no learning rate tuning.

This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks. Since the resulting learning rule seems to require little to no learning rate tuning, it may unlock a simpler workflow for training deeper and more complex neural networks. The Python code used in this paper is here: https://github.com/jxbz/fromage.

View on arXiv PDF Code

Similar