CVSep 22, 2025

CSDformer: A Conversion Method for Fully Spike-Driven Transformer

arXiv:2509.17461v1h-index: 17
Originality Incremental advance
AI Analysis

This addresses the energy efficiency and training overhead challenges for deploying transformers in neuromorphic computing, though it appears incremental as it builds on existing conversion methods with specific architectural tweaks.

The paper tackles the problem of high training costs and hardware-unfriendly operations in spike-based transformers by proposing CSDformer, a conversion method that achieves 76.36% top-1 accuracy on ImageNet with 7 time-steps while reducing computational resources by 75% and accelerating training speed by 2-3×.

Spike-based transformer is a novel architecture aiming to enhance the performance of spiking neural networks while mitigating the energy overhead inherent to transformers. However, methods for generating these models suffer from critical limitations: excessive training costs introduced by direct training methods, or unavoidably hardware-unfriendly operations in existing conversion methods. In this paper, we propose CSDformer, a novel conversion method for fully spike-driven transformers. We tailor a conversion-oriented transformer-based architecture and propose a new function NReLU to replace softmax in self-attention. Subsequently, this model is quantized and trained, and converted into a fully spike-driven model with temporal decomposition technique. Also, we propose delayed Integrate-andFire neurons to reduce conversion errors and improve the performance of spiking models. We evaluate CSDformer on ImageNet, CIFAR-10 and CIFAR-100 datasets and achieve 76.36% top-1 accuracy under 7 time-steps on ImageNet, demonstrating superiority over state-of-the-art models. Furthermore, CSDformer eliminates the need for training SNNs, thereby reducing training costs (reducing computational resource by 75% and accelerating training speed by 2-3$\times$). To the best of our knowledge, this is the first fully spike-driven transformer-based model developed via conversion method, achieving high performance under ultra-low latency, while dramatically reducing both computational complexity and training overhead.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes