NE LGJun 1, 2022

A Theoretical Framework for Inference Learning

Nick Alonso, Beren Millidge, Jeff Krichmar, Emre Neftci

arXiv:2206.00164v119.223 citationsh-index: 39Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for more biologically plausible alternatives to backpropagation in deep learning, offering incremental theoretical insights and practical improvements for researchers in computational neuroscience and machine learning.

The paper tackles the lack of mathematical understanding of the biologically plausible inference learning (IL) algorithm by developing a theoretical framework linking it to implicit stochastic gradient descent, showing that a novel implementation improves stability across learning rates and achieves quicker convergence with small mini-batches while matching backpropagation's performance for large mini-batches.

Backpropagation (BP) is the most successful and widely used algorithm in deep learning. However, the computations required by BP are challenging to reconcile with known neurobiology. This difficulty has stimulated interest in more biologically plausible alternatives to BP. One such algorithm is the inference learning algorithm (IL). IL has close connections to neurobiological models of cortical function and has achieved equal performance to BP on supervised learning and auto-associative tasks. In contrast to BP, however, the mathematical foundations of IL are not well-understood. Here, we develop a novel theoretical framework for IL. Our main result is that IL closely approximates an optimization method known as implicit stochastic gradient descent (implicit SGD), which is distinct from the explicit SGD implemented by BP. Our results further show how the standard implementation of IL can be altered to better approximate implicit SGD. Our novel implementation considerably improves the stability of IL across learning rates, which is consistent with our theory, as a key property of implicit SGD is its stability. We provide extensive simulation results that further support our theoretical interpretations and also demonstrate IL achieves quicker convergence when trained with small mini-batches while matching the performance of BP for large mini-batches.

View on arXiv PDF Code

Similar