NELGNov 10, 2024

Recurrent Joint Embedding Predictive Architecture with Recurrent Forward Propagation Learning

arXiv:2411.16695v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the challenge of developing more efficient and biologically plausible vision models for AI, though it is incremental as it focuses on theoretical innovations without empirical downstream task evaluation.

The paper tackled the problem of designing biologically-inspired vision networks that learn continuously from sequences of image patches without explicit supervision, introducing a joint embedding predictive architecture with recurrent gated circuits and a novel learning algorithm called Recurrent-Forward Propagation, which avoids representational collapse and implements exact gradient descent efficiently.

Conventional computer vision models rely on very deep, feedforward networks processing whole images and trained offline with extensive labeled data. In contrast, biological vision relies on comparatively shallow, recurrent networks that analyze sequences of fixated image patches, learning continuously in real-time without explicit supervision. This work introduces a vision network inspired by these biological principles. Specifically, it leverages a joint embedding predictive architecture incorporating recurrent gated circuits. The network learns by predicting the representation of the next image patch (fixation) based on the sequence of past fixations, a form of self-supervised learning. We show mathematical and empirically that the training algorithm avoids the problem of representational collapse. We also introduce \emph{Recurrent-Forward Propagation}, a learning algorithm that avoids biologically unrealistic backpropagation through time or memory-inefficient real-time recurrent learning. We show mathematically that the algorithm implements exact gradient descent for a large class of recurrent architectures, and confirm empirically that it learns efficiently. This paper focuses on these theoretical innovations and leaves empirical evaluation of performance in downstream tasks, and analysis of representational similarity with biological vision for future work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes