27.1LGMay 2
Congestion-Aware Dynamic Axonal Delay for Spiking Neural NetworksDewei Bai, Hongxiang Peng, Yunyun Zeng et al.
Spiking Neural Networks (SNNs) are widely regarded as an energy-efficient paradigm for modeling and processing temporal and event-driven information. Incorporating delays in SNNs has been proven to be an effective mechanism for improving spike alignment in event-driven tasks. However, existing delay learning approaches predominantly assign static delays to individual synapses, resulting in a large number of delay parameters and limited adaptability to input-dependent activity dynamics. To this end, we propose a Congestion-Aware Dynamic Axonal Delay mechanism, decomposing the delay into a channel-wise static base delay for temporal structuring and a global, activity-conditioned shift that dynamically regulates the state update rate under varying spike intensities. The delay parameters are learned using differentiable linear interpolation and discretized at inference time, preserving the benefits of our dynamic delay while incurring only minimal additional cost. Experiments on speech benchmarks, including the Spiking Heidelberg Dataset, Spiking Speech Commands, and Google Speech Commands, demonstrate that introducing congestion-aware delays into synaptic signal transmission effectively improves accuracy on temporal tasks, notably achieving 93.75\% accuracy on SHD, 80.49\% accuracy on SSC, and 95.53\% on GSC-35, while reducing the parameter count by approximately 50\% compared to state-of-the-art delay-based methods with the same architecture.
32.3CVApr 25
BSViT: A Burst Spiking Vision Transformer for Expressive and Efficient Visual Representation LearningHongxiang Peng, Dewei Bai, Hong Qu
Spiking Vision Transformers (S-ViTs) offer a promising framework for energy-efficient visual learning. However, existing designs remain limited by two fundamental issues: the restricted information capacity of binary spike coding and the dense token interactions introduced by global self-attention. To address these challenges, this work proposes BSViT, a burst spiking-driven Vision Transformer featuring a Dual-Channel Burst Spiking Self-Attention (DBSSA) mechanism. DBSSA encodes queries with binary spikes and keys with burst spikes to enhance representational capacity. The value pathway adopts dual excitatory and inhibitory binary channels, enabling signed modulation and richer spike interactions. Importantly, the entire attention operation preserves addition-only computation, ensuring compatibility with energy-efficient neuromorphic hardware. To further reduce spike activity and incorporate spatial priors, a patch adjacency masking strategy is introduced to restrict attention to local neighborhoods, resulting in structure-aware sparsity and reduced computational overhead. In addition, burst spike coding is systematically integrated across the network to increase spike-level representational capacity beyond conventional binary spiking. Extensive experiments on both static and event-based vision benchmarks demonstrate that BSViT consistently outperforms existing spiking Transformers in accuracy while maintaining competitive energy efficiency.
36.4CVApr 28
QB-LIF: Learnable-Scale Quantized Burst Neurons for Efficient SNNsDewei Bai, Hongxiang Peng, Jiajun Mei et al.
Binary spike coding enables sparse and event-driven computation in spiking neural networks (SNNs), yet its 1-bit-per-timestep representation fundamentally limits information throughput. This bottleneck becomes increasingly restrictive in deep architectures under short simulation horizons. We propose the Quantized Burst-LIF (QB-LIF) neuron, which reformulates burst spiking as a saturated uniform quantization of membrane potentials with a learnable scale. Instead of relying on predefined multi-threshold structures, QB-LIF treats the quantization scale as a trainable parameter, allowing each layer to autonomously adapt its spiking resolution to the underlying membrane-potential statistics. To preserve hardware efficiency, we introduce an absorbable scale strategy that folds the learned quantized scale into synaptic weights during inference, maintaining a strict accumulate-only (AC) execution paradigm. To enable stable optimization in the discrete multi-level space, we further design ReLSG-ET, a rectified-linear surrogate gradient with exponential tails that sustains gradient flow across burst intervals. Extensive experiments on static (CIFAR-10/100, ImageNet) and event-driven (CIFAR10-DVS, DVS128-Gesture) benchmarks demonstrate that QB-LIF consistently outperforms binary and fixed-burst SNNs, achieving higher accuracy under ultra-low latency while preserving neuromorphic compatibility.
41.2CVApr 28
Vision SmolMamba: Spike-Guided Token Pruning for Energy-Efficient Spiking State-Space Vision ModelsDewei Bai, Hongxiang Peng, Yunyun Zeng et al.
Spiking Transformers have shown strong potential for long-range visual modeling through spike-driven self-attention. However, their quadratic token interactions remain fundamentally misaligned with the sparse and event-driven nature of spiking neural computation. To address this limitation, we propose Vision SmolMamba, an energy-efficient spiking state-space architecture that integrates spike-driven dynamics with linear-time selective recurrence. The key idea is a Spike-Guided Spatio-Temporal Token Pruner (SST-TP), which estimates token importance using both spike activation strength and first-spike latency. This mechanism progressively removes redundant tokens while preserving salient spatio-temporal information, enabling efficient scaling with token sparsity. Based on this mechanism, the proposed SmolMamba block incorporates spike events directly into bidirectional state-space recurrence, forming a spiking state-space vision backbone for efficient long-range modeling. Extensive experiments on both static and event-based benchmarks, including ImageNet-1K, CIFAR10/100, CIFAR10-DVS, and DVS128 Gesture, demonstrate that Vision SmolMamba consistently achieves superior accuracy-efficiency trade-offs. In particular, it reduces the estimated energy cost by at least 1.5x compared with prior spiking Transformer baselines and a Spiking Mamba variant while maintaining competitive or improved accuracy. These results demonstrate that combining spike-guided token sparsity with state-space modeling offers a scalable and energy-efficient paradigm for spiking vision systems.