LG NE MLMay 31, 2019

Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input

Maxence Ernoult, Julie Grollier, Damien Querlioz, Yoshua Bengio, Benjamin Scellier

arXiv:1905.13633v114.860 citations

Originality Incremental advance

AI Analysis

This work addresses the computational inefficiency of EP for training convergent RNNs, making it more applicable to practical machine learning problems, though it is incremental as it builds on existing EP and BPTT methods.

The paper introduces a discrete-time version of Equilibrium Propagation (EP) that simplifies equations and reduces simulation time, making it more practical for machine learning tasks. It proves theoretically and numerically that EP updates match those of Backpropagation Through Time (BPTT) step-by-step under certain conditions, and demonstrates training with EP achieving ~1% test error on MNIST, the lowest reported with EP.

Equilibrium Propagation (EP) is a biologically inspired learning algorithm for convergent recurrent neural networks, i.e. RNNs that are fed by a static input x and settle to a steady state. Training convergent RNNs consists in adjusting the weights until the steady state of output neurons coincides with a target y. Convergent RNNs can also be trained with the more conventional Backpropagation Through Time (BPTT) algorithm. In its original formulation EP was described in the case of real-time neuronal dynamics, which is computationally costly. In this work, we introduce a discrete-time version of EP with simplified equations and with reduced simulation time, bringing EP closer to practical machine learning tasks. We first prove theoretically, as well as numerically that the neural and weight updates of EP, computed by forward-time dynamics, are step-by-step equal to the ones obtained by BPTT, with gradients computed backward in time. The equality is strict when the transition function of the dynamics derives from a primitive function and the steady state is maintained long enough. We then show for more standard discrete-time neural network dynamics that the same property is approximately respected and we subsequently demonstrate training with EP with equivalent performance to BPTT. In particular, we define the first convolutional architecture trained with EP achieving ~ 1% test error on MNIST, which is the lowest error reported with EP. These results can guide the development of deep neural networks trained with EP.

View on arXiv PDF

Similar