CVMar 4, 2025

STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

arXiv:2503.02689v316 citationsh-index: 4CVPR
Originality Incremental advance
AI Analysis

This work addresses a key challenge in neuromorphic computing by enhancing SNN performance for applications like image recognition, though it appears incremental as it builds on existing attention mechanisms.

The paper tackles the performance gap between Spiking Neural Networks (SNNs) and Artificial Neural Networks (ANNs) by proposing STAA-SNN, a framework that dynamically captures spatial and temporal dependencies, achieving state-of-the-art results such as 97.14% on CIFAR-10 and improved performance with fewer time steps.

Spiking Neural Networks (SNNs) have gained significant attention due to their biological plausibility and energy efficiency, making them promising alternatives to Artificial Neural Networks (ANNs). However, the performance gap between SNNs and ANNs remains a substantial challenge hindering the widespread adoption of SNNs. In this paper, we propose a Spatial-Temporal Attention Aggregator SNN (STAA-SNN) framework, which dynamically focuses on and captures both spatial and temporal dependencies. First, we introduce a spike-driven self-attention mechanism specifically designed for SNNs. Additionally, we pioneeringly incorporate position encoding to integrate latent temporal relationships into the incoming features. For spatial-temporal information aggregation, we employ step attention to selectively amplify relevant features at different steps. Finally, we implement a time-step random dropout strategy to avoid local optima. As a result, STAA-SNN effectively captures both spatial and temporal dependencies, enabling the model to analyze complex patterns and make accurate predictions. The framework demonstrates exceptional performance across diverse datasets and exhibits strong generalization capabilities. Notably, STAA-SNN achieves state-of-the-art results on neuromorphic datasets CIFAR10-DVS, with remarkable performances of 97.14%, 82.05% and 70.40% on the static datasets CIFAR-10, CIFAR-100 and ImageNet, respectively. Furthermore, our model exhibits improved performance ranging from 0.33\% to 2.80\% with fewer time steps. The code for the model is available on GitHub.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes