LG CLAug 14, 2025

Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Enhanced Model Architectures

Parsa Omidi, Xingshuai Huang, Axel Laborieux, Bahareh Nikpour, Tianyu Shi, Armaghan Eshaghi

arXiv:2508.10824v213.07 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of enhancing Transformer architectures for lifelong learning and adaptability, but it is incremental as it synthesizes existing research rather than introducing new methods.

This review tackles the limitations of Transformers in long-range context retention and continual learning by proposing a unified framework that bridges neuroscience principles with engineering advances in Memory-Augmented Transformers, organizing progress through taxonomic dimensions and identifying challenges like scalability and interference.

Memory is fundamental to intelligence, enabling learning, reasoning, and adaptability across biological and artificial systems. While Transformer architectures excel at sequence modeling, they face critical limitations in long-range context retention, continual learning, and knowledge integration. This review presents a unified framework bridging neuroscience principles, including dynamic multi-timescale memory, selective attention, and consolidation, with engineering advances in Memory-Augmented Transformers. We organize recent progress through three taxonomic dimensions: functional objectives (context extension, reasoning, knowledge integration, adaptation), memory representations (parameter-encoded, state-based, explicit, hybrid), and integration mechanisms (attention fusion, gated control, associative retrieval). Our analysis of core memory operations (reading, writing, forgetting, and capacity management) reveals a shift from static caches toward adaptive, test-time learning systems. We identify persistent challenges in scalability and interference, alongside emerging solutions including hierarchical buffering and surprise-gated updates. This synthesis provides a roadmap toward cognitively-inspired, lifelong-learning Transformer architectures.

View on arXiv PDF

Similar