LG MLJun 25, 2020

A Theoretical Framework for Target Propagation

Alexander Meulemans, Francesco S. Carzaniga, Johan A. K. Suykens, João Sacramento, Benjamin F. Grewe

arXiv:2006.14331v420.6104 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of developing biologically-plausible deep learning alternatives for neuroscience and AI researchers, but it is incremental as it builds on existing target propagation methods.

The paper tackled the performance gap between biologically-plausible learning algorithms like target propagation and backpropagation by analyzing target propagation from an optimization perspective, revealing a fundamental limitation in difference target propagation for non-invertible networks and providing a solution that improved performance and alignment with loss gradients.

The success of deep learning, a brain-inspired form of AI, has sparked interest in understanding how the brain could similarly learn across multiple layers of neurons. However, the majority of biologically-plausible learning algorithms have not yet reached the performance of backpropagation (BP), nor are they built on strong theoretical foundations. Here, we analyze target propagation (TP), a popular but not yet fully understood alternative to BP, from the standpoint of mathematical optimization. Our theory shows that TP is closely related to Gauss-Newton optimization and thus substantially differs from BP. Furthermore, our analysis reveals a fundamental limitation of difference target propagation (DTP), a well-known variant of TP, in the realistic scenario of non-invertible neural networks. We provide a first solution to this problem through a novel reconstruction loss that improves feedback weight training, while simultaneously introducing architectural flexibility by allowing for direct feedback connections from the output to each hidden layer. Our theory is corroborated by experimental results that show significant improvements in performance and in the alignment of forward weight updates with loss gradients, compared to DTP.

View on arXiv PDF Code

Similar