AIMar 19

MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

Zuher Jahshan, Ben Ben Ishay, Leonid Yavits

arXiv:2603.1867627.0h-index: 7

Predicted impact top 90% in AI · last 90 daysOriginality Highly original

AI Analysis

This addresses the computational inefficiency and limited expressiveness in transformers for researchers and practitioners in ML/AI, offering a compatible, efficient alternative to standard attention.

The paper tackles the quadratic complexity and lack of global integration in standard multi-head attention by proposing MANAR, a memory-augmented attention mechanism based on Global Workspace Theory, which achieves efficient linear-time scaling and matches or exceeds baselines with scores like 85.1 on GLUE, 83.9% on ImageNet-1K, and 2.7% WER on LibriSpeech.

MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck and global integration mechanisms hypothesized in cognitive models of consciousness. MANAR addresses this by implementing a central workspace through a trainable memory of abstract concepts and an Abstract Conceptual Representation (ACR). The architecture follows a two-stage logic that maps directly to GWT mechanics: (i) an integration phase, where retrieved memory concepts converge to form a collective "mental image" (the ACR) based on input stimuli; and (ii) a broadcasting phase, where this global state navigates and informs the contextualization of individual local tokens. We demonstrate that efficient linear-time scaling is a fundamental architectural byproduct of instantiating GWT functional bottleneck, as routing global information through a constant-sized ACR resolves the quadratic complexity inherent in standard attention. MANAR is a compatible re-parameterization of MHA with identical semantic roles for its projections, enabling knowledge transfer from pretrained transformers via weight-copy and thus overcoming the adoption barriers of structurally incompatible linear-time alternatives. MANAR enables non-convex contextualization, synthesizing representations that provably lie outside the convex hull of input tokens - a mathematical reflection of the creative synthesis described in GWT. Empirical evaluations confirm that MANAR matches or exceeds strong baselines across language (GLUE score of 85.1), vision (83.9% ImageNet-1K), and speech (2.7% WER on LibriSpeech), positioning it as an efficient and expressive alternative to quadratic attention.

View on arXiv PDF

Similar