SAFformer:Improving Spiking Transformer via Active Predictive Filtering
This work addresses the inefficiency of passive reactive Spiking Transformers in processing visual data, offering a more biologically plausible and energy-efficient architecture for low-power vision tasks.
SAFformer introduces an active predictive filtering paradigm for Spiking Transformers, inspired by predictive coding in the brain, to suppress redundant signals and focus on salient features. It achieves state-of-the-art results on CIFAR-10/100 and CIFAR10-DVS, and on ImageNet-1K reaches 80.50% Top-1 accuracy with 26.58M parameters and 5.88 mJ energy consumption.
Spiking Neural Networks (SNNs) offer notable advantages in biological plausibility and energy efficiency, making them promising candidates for building low-power Transformers. However, existing Spiking Transformers largely adhere to a passive reactive paradigm, which struggles to focus on task-relevant information and incurs substantial computational overhead when processing redundant visual data. To overcome this fundamental yet underexplored limitation, we propose SAFformer, a novel Spiking Transformer architecture based on an active predictive filtering paradigm. Inspired by the brain's predictive coding mechanism, SAFformer actively suppresses predictable signals and focuses on salient visual features. Extensive experiments show that SAFformer establishes new state-of-the-art performance on CIFAR-10/100 and CIFAR10-DVS. Remarkably, on ImageNet-1K, it achieves 80.50% Top-1 accuracy with only 26.58M parameters and an energy consumption of 5.88 mJ, demonstrating an exceptional balance between accuracy and efficiency.