STF: Shallow-Level Temporal Feedback to Enhance Spiking Transformers
This work addresses efficiency and performance issues in SNNs for static image tasks, offering a novel encoding scheme that reduces computational costs compared to deep-level feedback methods.
The paper tackles the performance gap between Transformer-based Spiking Neural Networks (SNNs) and floating-point Artificial Neural Networks (ANNs) by proposing Shallow-level Temporal Feedback (STF), a lightweight plug-and-play module that improves performance across various backbones on datasets like CIFAR-10, CIFAR-100, and ImageNet-1K, with enhanced spike pattern diversity and better adversarial robustness.
Transformer-based Spiking Neural Networks (SNNs) suffer from a great performance gap compared to floating-point \mbox{Artificial} Neural Networks (ANNs) due to the binary nature of spike trains. Recent efforts have introduced deep-level feedback loops to transmit high-level semantic information to narrow this gap. However, these designs often span \mbox{multiple} deep layers, resulting in costly feature transformations, higher parameter overhead, increased energy consumption, and longer inference latency. To address this issue, we propose Shallow-level Temporal Feedback (STF), a lightweight plug-and-play module for the encoding layer, which consists of Temporal-Spatial Position Embedding (TSPE) and Temporal Feedback (TF). Extensive experiments show that STF consistently improves performance across various Transformer-based SNN backbones on static datasets, including CIFAR-10, CIFAR-100, and ImageNet-1K, under different spike timestep settings. Further analysis reveals that STF enhances the diversity of spike patterns, which is key to performance gain. Moreover, evaluations on adversarial robustness and temporal sensitivity confirm that STF outperforms direct coding and its variants, highlighting its potential as a new spike encoding scheme for static scenarios. Our code will be released upon acceptance.