CVMar 4, 2025

STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

Tianqing Zhang, Kairong Yu, Xian Zhong, Hongwei Wang, Qi Xu, Qiang Zhang

arXiv:2503.02689v317.417 citationsh-index: 4CVPR

Originality Incremental advance

AI Analysis

This work addresses a key challenge in neuromorphic computing by enhancing SNN performance for applications like image recognition, though it appears incremental as it builds on existing attention mechanisms.

The paper tackles the performance gap between Spiking Neural Networks (SNNs) and Artificial Neural Networks (ANNs) by proposing STAA-SNN, a framework that dynamically captures spatial and temporal dependencies, achieving state-of-the-art results such as 97.14% on CIFAR-10 and improved performance with fewer time steps.

Spiking Neural Networks (SNNs) have gained significant attention due to their biological plausibility and energy efficiency, making them promising alternatives to Artificial Neural Networks (ANNs). However, the performance gap between SNNs and ANNs remains a substantial challenge hindering the widespread adoption of SNNs. In this paper, we propose a Spatial-Temporal Attention Aggregator SNN (STAA-SNN) framework, which dynamically focuses on and captures both spatial and temporal dependencies. First, we introduce a spike-driven self-attention mechanism specifically designed for SNNs. Additionally, we pioneeringly incorporate position encoding to integrate latent temporal relationships into the incoming features. For spatial-temporal information aggregation, we employ step attention to selectively amplify relevant features at different steps. Finally, we implement a time-step random dropout strategy to avoid local optima. As a result, STAA-SNN effectively captures both spatial and temporal dependencies, enabling the model to analyze complex patterns and make accurate predictions. The framework demonstrates exceptional performance across diverse datasets and exhibits strong generalization capabilities. Notably, STAA-SNN achieves state-of-the-art results on neuromorphic datasets CIFAR10-DVS, with remarkable performances of 97.14%, 82.05% and 70.40% on the static datasets CIFAR-10, CIFAR-100 and ImageNet, respectively. Furthermore, our model exhibits improved performance ranging from 0.33\% to 2.80\% with fewer time steps. The code for the model is available on GitHub.

View on arXiv PDF

Similar