AIMar 19

MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

arXiv:2603.1867627.0h-index: 7
Predicted impact top 90% in AI · last 90 daysOriginality Highly original
AI Analysis

This addresses the computational inefficiency and limited expressiveness in transformers for researchers and practitioners in ML/AI, offering a compatible, efficient alternative to standard attention.

The paper tackles the quadratic complexity and lack of global integration in standard multi-head attention by proposing MANAR, a memory-augmented attention mechanism based on Global Workspace Theory, which achieves efficient linear-time scaling and matches or exceeds baselines with scores like 85.1 on GLUE, 83.9% on ImageNet-1K, and 2.7% WER on LibriSpeech.

MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck and global integration mechanisms hypothesized in cognitive models of consciousness. MANAR addresses this by implementing a central workspace through a trainable memory of abstract concepts and an Abstract Conceptual Representation (ACR). The architecture follows a two-stage logic that maps directly to GWT mechanics: (i) an integration phase, where retrieved memory concepts converge to form a collective "mental image" (the ACR) based on input stimuli; and (ii) a broadcasting phase, where this global state navigates and informs the contextualization of individual local tokens. We demonstrate that efficient linear-time scaling is a fundamental architectural byproduct of instantiating GWT functional bottleneck, as routing global information through a constant-sized ACR resolves the quadratic complexity inherent in standard attention. MANAR is a compatible re-parameterization of MHA with identical semantic roles for its projections, enabling knowledge transfer from pretrained transformers via weight-copy and thus overcoming the adoption barriers of structurally incompatible linear-time alternatives. MANAR enables non-convex contextualization, synthesizing representations that provably lie outside the convex hull of input tokens - a mathematical reflection of the creative synthesis described in GWT. Empirical evaluations confirm that MANAR matches or exceeds strong baselines across language (GLUE score of 85.1), vision (83.9% ImageNet-1K), and speech (2.7% WER on LibriSpeech), positioning it as an efficient and expressive alternative to quadratic attention.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes