NEJan 31, 2022

Towards Scaling Difference Target Propagation by Learning Backprop Targets

Maxence Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

arXiv:2201.13415v123.951 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the problem of making biologically-plausible learning algorithms scalable for researchers in computational neuroscience and AI, though it is incremental as it builds on existing DTP methods.

The paper tackled the challenge of scaling Difference Target Propagation (DTP), a biologically-plausible learning algorithm, to real-world tasks by proposing a novel feedback weight training scheme that ensures DTP approximates backpropagation and allows layer-wise training without sacrificing theoretical guarantees. The result was the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32x32.

The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32$\times$32

View on arXiv PDF Code

Similar