SDAIMay 13

NAACA: Training-Free NeuroAuditory Attentive Cognitive Architecture with Oscillatory Working Memory for Salience-Driven Attention Gating

arXiv:2605.1365110.0
Predicted impact top 56% in SD · last 90 daysOriginality Incremental advance
AI Analysis

For audio language models processing long-form recordings, NAACA addresses the attention bottleneck caused by dominant background patterns, enabling more efficient and accurate detection of rare salient events.

NAACA improves AudioQwen's average precision on XD-Violence from 53.50% to 70.60% while reducing unnecessary ALM invocations by reframing attention allocation as an auditory salience filtering problem with a neuro-inspired oscillatory working memory.

Audio provides critical situational cues, yet current Audio Language Models (ALMs) face an attention bottleneck in long-form recordings where dominant background patterns can dilute rare, salient events. We introduce NAACA, a training-free NeuroAuditory Attentive Cognitive Architecture that reframes attention allocation as an auditory salience filtering problem. At its core is OWM, a neuro-inspired Oscillatory Working Memory that maintains stable attractor-like states and triggers higher-cognition ALM processing only when adaptive energy fluctuations signal perceptual salience, triggering higher-level reasoning. On XD-Violence, NAACA improves AudioQwen's average precision (AP) from 53.50% to 70.60% while reducing unnecessary ALM invocations. Furthermore, qualitative case studies on the Urban Soundscapes of the World (USoW) dataset show that OWM captures novel events and subcategory shifts while remaining robust to transient pauses and ambient urban noise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes