Zhaofei Yu

CV
h-index22
63papers
3,175citations
Novelty57%
AI Score64

63 Papers

NEFeb 4, 2023Code
Reducing ANN-SNN Conversion Error through Residual Membrane Potential

Zecheng Hao, Tong Bu, Jianhao Ding et al. · pku

Spiking Neural Networks (SNNs) have received extensive academic attention due to the unique properties of low power consumption and high-speed computing on neuromorphic chips. Among various training methods of SNNs, ANN-SNN conversion has shown the equivalent level of performance as ANNs on large-scale datasets. However, unevenness error, which refers to the deviation caused by different temporal sequences of spike arrival on activation layers, has not been effectively resolved and seriously suffers the performance of SNNs under the condition of short time-steps. In this paper, we make a detailed analysis of unevenness error and divide it into four categories. We point out that the case of the ANN output being zero while the SNN output being larger than zero accounts for the largest percentage. Based on this, we theoretically prove the sufficient and necessary conditions of this case and propose an optimization strategy based on residual membrane potential to reduce unevenness error. The experimental results show that the proposed method achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet datasets. For example, we reach top-1 accuracy of 64.32\% on ImageNet with 10-steps. To the best of our knowledge, this is the first time ANN-SNN conversion can simultaneously achieve high accuracy and ultra-low-latency on the complex dataset. Code is available at https://github.com/hzc1208/ANN2SNN\_SRP.

NEFeb 21, 2023Code
Bridging the Gap between ANNs and SNNs by Calibrating Offset Spikes

Zecheng Hao, Jianhao Ding, Tong Bu et al. · pku

Spiking Neural Networks (SNNs) have attracted great attention due to their distinctive characteristics of low power consumption and temporal information processing. ANN-SNN conversion, as the most commonly used training method for applying SNNs, can ensure that converted SNNs achieve comparable performance to ANNs on large-scale datasets. However, the performance degrades severely under low quantities of time-steps, which hampers the practical applications of SNNs to neuromorphic chips. In this paper, instead of evaluating different conversion errors and then eliminating these errors, we define an offset spike to measure the degree of deviation between actual and desired SNN firing rates. We perform a detailed analysis of offset spike and note that the firing of one additional (or one less) spike is the main cause of conversion errors. Based on this, we propose an optimization strategy based on shifting the initial membrane potential and we theoretically prove the corresponding optimal shifting distance for calibrating the spike. In addition, we also note that our method has a unique iterative property that enables further reduction of conversion errors. The experimental results show that our proposed method achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet datasets. For example, we reach a top-1 accuracy of 67.12% on ImageNet when using 6 time-steps. To the best of our knowledge, this is the first time an ANN-SNN conversion has been shown to simultaneously achieve high accuracy and ultralow latency on complex datasets. Code is available at https://github.com/hzc1208/ANN2SNN_COS.

CVMar 21, 2023Code
SpikeCV: Open a Continuous Computer Vision Era

Yajing Zheng, Jiyuan Zhang, Rui Zhao et al. · pku

SpikeCV is a new open-source computer vision platform for the spike camera, which is a neuromorphic visual sensor that has developed rapidly in recent years. In the spike camera, each pixel position directly accumulates the light intensity and asynchronously fires spikes. The output binary spikes can reach a frequency of 40,000 Hz. As a new type of visual expression, spike sequence has high spatiotemporal completeness and preserves the continuous visual information of the external world. Taking advantage of the low latency and high dynamic range of the spike camera, many spike-based algorithms have made significant progress, such as high-quality imaging and ultra-high-speed target detection. To build up a community ecology for the spike vision to facilitate more users to take advantage of the spike camera, SpikeCV provides a variety of ultra-high-speed scene datasets, hardware interfaces, and an easy-to-use modules library. SpikeCV focuses on encapsulation for spike data, standardization for dataset interfaces, modularization for vision tasks, and real-time applications for challenging scenes. With the advent of the open-source Python ecosystem, modules of SpikeCV can be used as a Python library to fulfilled most of the numerical analysis needs of researchers. We demonstrate the efficiency of the SpikeCV on offline inference and real-time applications. The project repository address are \url{https://openi.pcl.ac.cn/Cordium/SpikeCV} and \url{https://github.com/Zyj061/SpikeCV

NEApr 25, 2023Code
Parallel Spiking Neurons with High Efficiency and Ability to Learn Long-term Dependencies

Wei Fang, Zhaofei Yu, Zhaokun Zhou et al.

Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated serially and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics can be reformulated in a non-iterative form and parallelized. By rewriting neuronal dynamics without reset to a general formulation, we propose the Parallel Spiking Neuron (PSN), which generates hidden states that are independent of their predecessors, resulting in parallelizable neuronal dynamics and extremely high simulation speed. The weights of inputs in the PSN are fully connected, which maximizes the utilization of temporal information. To avoid the use of future inputs for step-by-step inference, the weights of the PSN can be masked, resulting in the masked PSN. By sharing weights across time-steps based on the masked PSN, the sliding PSN is proposed to handle sequences of varying lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To the best of our knowledge, this is the first study about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning research. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}.

LGFeb 25, 2023Code
A Unified Framework for Soft Threshold Pruning

Yanqi Chen, Zhengyu Ma, Wei Fang et al.

Soft threshold pruning is among the cutting-edge pruning methods with state-of-the-art performance. However, previous methods either perform aimless searching on the threshold scheduler or simply set the threshold trainable, lacking theoretical explanation from a unified perspective. In this work, we reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA), a classic method from the fields of sparse recovery and compressed sensing. Under this theoretical framework, all threshold tuning strategies proposed in previous studies of soft threshold pruning are concluded as different styles of tuning $L_1$-regularization term. We further derive an optimal threshold scheduler through an in-depth study of threshold scheduling based on our framework. This scheduler keeps $L_1$-regularization coefficient stable, implying a time-invariant objective function from the perspective of optimization. In principle, the derived pruning algorithm could sparsify any mathematical model trained via SGD. We conduct extensive experiments and verify its state-of-the-art performance on both Artificial Neural Networks (ResNet-50 and MobileNet-V1) and Spiking Neural Networks (SEW ResNet-18) on ImageNet datasets. On the basis of this framework, we derive a family of pruning methods, including sparsify-during-training, early pruning, and pruning at initialization. The code is available at https://github.com/Yanqi-Chen/LATS.

NEOct 25, 2023
SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence

Wei Fang, Yanqi Chen, Jianhao Ding et al.

Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency by introducing neural dynamics and spike properties. As the emerging spiking deep learning paradigm attracts increasing interest, traditional programming frameworks cannot meet the demands of the automatic differentiation, parallel computation acceleration, and high integration of processing neuromorphic datasets and deployment. In this work, we present the SpikingJelly framework to address the aforementioned dilemma. We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips. Compared to existing methods, the training of deep SNNs can be accelerated $11\times$, and the superior extensibility and flexibility of SpikingJelly enable users to accelerate custom models at low costs through multilevel inheritance and semiautomatic code generation. SpikingJelly paves the way for synthesizing truly energy-efficient SNN-based machine intelligence systems, which will enrich the ecology of neuromorphic computing.

CVJul 12, 2023
Unsupervised Optical Flow Estimation with Dynamic Timing Representation for Spike Camera

Lujie Xia, Ziluo Ding, Rui Zhao et al. · pku

Efficiently selecting an appropriate spike stream data length to extract precise information is the key to the spike vision tasks. To address this issue, we propose a dynamic timing representation for spike streams. Based on multi-layers architecture, it applies dilated convolutions on temporal dimension to extract features on multi-temporal scales with few parameters. And we design layer attention to dynamically fuse these features. Moreover, we propose an unsupervised learning method for optical flow estimation in a spike-based manner to break the dependence on labeled data. In addition, to verify the robustness, we also build a spike-based synthetic validation dataset for extreme scenarios in autonomous driving, denoted as SSES dataset. It consists of various corner cases. Experiments show that our method can predict optical flow from spike streams in different high-speed scenes, including real scenes. For instance, our method gets $15\%$ and $19\%$ error reduction from the best spike-based work, SCFlow, in $Δt=10$ and $Δt=20$ respectively which are the same settings as the previous works.

CVSep 19, 2024Code
Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks

Xian Zhong, Shengwang Hu, Wenxuan Liu et al.

Spiking neural networks (SNNs) have garnered significant attention for their low power consumption and high biological interpretability. Their rich spatio-temporal information processing capability and event-driven nature make them ideally well-suited for neuromorphic datasets. However, current SNNs struggle to balance accuracy and latency in classifying these datasets. In this paper, we propose Hybrid Step-wise Distillation (HSD) method, tailored for neuromorphic datasets, to mitigate the notable decline in performance at lower time steps. Our work disentangles the dependency between the number of event frames and the time steps of SNNs, utilizing more event frames during the training stage to improve performance, while using fewer event frames during the inference stage to reduce latency. Nevertheless, the average output of SNNs across all time steps is susceptible to individual time step with abnormal outputs, particularly at extremely low time steps. To tackle this issue, we implement Step-wise Knowledge Distillation (SKD) module that considers variations in the output distribution of SNNs at each time step. Empirical evidence demonstrates that our method yields competitive performance in classification tasks on neuromorphic datasets, especially at lower time steps. Our code will be available at: {https://github.com/hsw0929/HSD}.

CVApr 6, 2023
Spike Stream Denoising via Spike Camera Simulation

Liwen hu, Lei Ma, Zhaofei Yu et al. · pku

As a neuromorphic sensor with high temporal resolution, the spike camera shows enormous potential in high-speed visual tasks. However, the high-speed sampling of light propagation processes by existing cameras brings unavoidable noise phenomena. Eliminating the unique noise in spike stream is always a key point for spike-based methods. No previous work has addressed the detailed noise mechanism of the spike camera. To this end, we propose a systematic noise model for spike camera based on its unique circuit. In addition, we carefully constructed the noise evaluation equation and experimental scenarios to measure noise variables. Based on our noise model, the first benchmark for spike stream denoising is proposed which includes clear (noisy) spike stream. Further, we design a tailored spike stream denoising framework (DnSS) where denoised spike stream is obtained by decoding inferred inter-spike intervals. Experiments show that DnSS has promising performance on the proposed benchmark. Eventually, DnSS can be generalized well on real spike stream.

LGFeb 17, 2023
A Novel Noise Injection-based Training Scheme for Better Model Robustness

Zeliang Zhang, Jinyang Jiang, Minjie Chen et al. · pku

Noise injection-based method has been shown to be able to improve the robustness of artificial neural networks in previous work. In this work, we propose a novel noise injection-based training scheme for better model robustness. Specifically, we first develop a likelihood ratio method to estimate the gradient with respect to both synaptic weights and noise levels for stochastic gradient descent training. Then, we design an approximation for the vanilla noise injection-based training method to reduce memory and improve computational efficiency. Next, we apply our proposed scheme to spiking neural networks and evaluate the performance of classification accuracy and robustness on MNIST and Fashion-MNIST datasets. Experiment results show that our proposed method achieves a much better performance on adversarial robustness and slightly better performance on original accuracy, compared with the conventional gradient-based training method.

CVMar 25Code
Brain-Inspired Multimodal Spiking Neural Network for Image-Text Retrieval

Xintao Zong, Xian Zhong, Wenxuan Liu et al.

Spiking neural networks (SNNs) have recently shown strong potential in unimodal visual and textual tasks, yet building a directly trained, low-energy, and high-performance SNN for multimodal applications such as image-text retrieval (ITR) remains highly challenging. Existing artificial neural network (ANN)-based methods often pursue richer unimodal semantics using deeper and more complex architectures, while overlooking cross-modal interaction, retrieval latency, and energy efficiency. To address these limitations, we present a brain-inspired Cross-Modal Spike Fusion network (CMSF) and apply it to ITR for the first time. The proposed spike fusion mechanism integrates unimodal features at the spike level, generating enhanced multimodal representations that act as soft supervisory signals to refine unimodal spike embeddings, effectively mitigating semantic loss within CMSF. Despite requiring only two time steps, CMSF achieves top-tier retrieval accuracy, surpassing state-of-the-art ANN counterparts while maintaining exceptionally low energy consumption and high retrieval speed. This work marks a significant step toward multimodal SNNs, offering a brain-inspired framework that unifies temporal dynamics with cross-modal alignment and provides new insights for future spiking-based multimodal research. The code is available at https://github.com/zxt6174/CMSF.

CVMar 29, 2023
Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios

Shiyan Chen, Jiyuan Zhang, Zhaofei Yu et al.

Self-supervised denoising has attracted widespread attention due to its ability to train without clean images. However, noise in real-world scenarios is often spatially correlated, which causes many self-supervised algorithms that assume pixel-wise independent noise to perform poorly. Recent works have attempted to break noise correlation with downsampling or neighborhood masking. However, denoising on downsampled subgraphs can lead to aliasing effects and loss of details due to a lower sampling rate. Furthermore, the neighborhood masking methods either come with high computational complexity or do not consider local spatial preservation during inference. Through the analysis of existing methods, we point out that the key to obtaining high-quality and texture-rich results in real-world self-supervised denoising tasks is to train at the original input resolution structure and use asymmetric operations during training and inference. Based on this, we propose Asymmetric Tunable Blind-Spot Network (AT-BSN), where the blind-spot size can be freely adjusted, thus better balancing noise correlation suppression and image local spatial destruction during training and inference. In addition, we regard the pre-trained AT-BSN as a meta-teacher network capable of generating various teacher networks by sampling different blind-spots. We propose a blind-spot based multi-teacher distillation strategy to distill a lightweight network, significantly improving performance. Experimental results on multiple datasets prove that our method achieves state-of-the-art, and is superior to other self-supervised algorithms in terms of computational overhead and visual effects.

CVJul 3, 2023
Unveiling the Potential of Spike Streams for Foreground Occlusion Removal from Densely Continuous Views

Jiyuan Zhang, Shiyan Chen, Yajing Zheng et al.

The extraction of a clean background image by removing foreground occlusion holds immense practical significance, but it also presents several challenges. Presently, the majority of de-occlusion research focuses on addressing this issue through the extraction and synthesis of discrete images from calibrated camera arrays. Nonetheless, the restoration quality tends to suffer when faced with dense occlusions or high-speed motions due to limited perspectives and motion blur. To successfully remove dense foreground occlusion, an effective multi-view visual information integration approach is required. Introducing the spike camera as a novel type of neuromorphic sensor offers promising capabilities with its ultra-high temporal resolution and high dynamic range. In this paper, we propose an innovative solution for tackling the de-occlusion problem through continuous multi-view imaging using only one spike camera without any prior knowledge of camera intrinsic parameters and camera poses. By rapidly moving the spike camera, we continually capture the dense stream of spikes from the occluded scene. To process the spikes, we build a novel model \textbf{SpkOccNet}, in which we integrate information of spikes from continuous viewpoints within multi-windows, and propose a novel cross-view mutual attention mechanism for effective fusion and refinement. In addition, we contribute the first real-world spike-based dataset \textbf{S-OCC} for occlusion removal. The experimental results demonstrate that our proposed model efficiently removes dense occlusions in diverse scenes while exhibiting strong generalization.

CVJul 14, 2024
SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion

Jiyuan Zhang, Kang Chen, Shiyan Chen et al.

Novel View Synthesis plays a crucial role by generating new 2D renderings from multi-view images of 3D scenes. However, capturing high-speed scenes with conventional cameras often leads to motion blur, hindering the effectiveness of 3D reconstruction. To address this challenge, high-frame-rate dense 3D reconstruction emerges as a vital technique, enabling detailed and accurate modeling of real-world objects or scenes in various fields, including Virtual Reality or embodied AI. Spike cameras, a novel type of neuromorphic sensor, continuously record scenes with an ultra-high temporal resolution, showing potential for accurate 3D reconstruction. Despite their promise, existing approaches, such as applying Neural Radiance Fields (NeRF) to spike cameras, encounter challenges due to the time-consuming rendering process. To address this issue, we make the first attempt to introduce the 3D Gaussian Splatting (3DGS) into spike cameras in high-speed capture, providing 3DGS as dense and continuous clues of views, then constructing SpikeGS. Specifically, to train SpikeGS, we establish computational equations between the rendering process of 3DGS and the processes of instantaneous imaging and exposing-like imaging of the continuous spike stream. Besides, we build a very lightweight but effective mapping process from spikes to instant images to support training. Furthermore, we introduced a new spike-based 3D rendering dataset for validation. Extensive experiments have demonstrated our method possesses the high quality of novel view rendering, proving the tremendous potential of spike cameras in modeling 3D scenes.

CVApr 16
High-Speed Full-Color HDR Imaging via Unwrapping Modulo-Encoded Spike Streams

Chu Zhou, Siqi Yang, Kailong Zhang et al.

Conventional RGB-based high dynamic range (HDR) imaging faces a fundamental trade-off between motion artifacts in multi-exposure captures and irreversible information loss in single-shot techniques. Modulo sensors offer a promising alternative by encoding theoretically unbounded dynamic range into wrapped measurements. However, existing modulo solutions remain bottlenecked by iterative unwrapping overhead and hardware constraints limiting them to low-speed, grayscale capture. In this work, we present a complete modulo-based HDR imaging system that enables high-speed, full-color HDR acquisition by synergistically advancing both the sensing formulation and the unwrapping algorithm. At the core of our approach is an exposure-decoupled formulation of modulo imaging that allows multiple measurements to be interleaved in time, preserving a clean, observation-wise measurement model. Building upon this, we introduce an iteration-free unwrapping algorithm that integrates diffusion-based generative priors with the physical least absolute remainder property of modulo images, supporting highly efficient, physics-consistent HDR reconstruction. Finally, to validate the practical viability of our system, we demonstrate a proof-of-concept hardware implementation based on modulo-encoded spike streams. This setup preserves the native high temporal resolution of spike cameras, achieving 1000 FPS full-color imaging while reducing output data bandwidth from approximately 20 Gbps to 6 Gbps. Extensive evaluations indicate that our coordinated approach successfully overcomes key systemic bottlenecks, demonstrating the feasibility of deploying modulo imaging in dynamic scenarios.

NEMar 30
PredNext: Explicit Cross-View Temporal Prediction for Unsupervised Learning in Spiking Neural Networks

Yiting Dong, Jianhao Ding, Zijie Xu et al.

Spiking Neural Networks (SNNs), with their temporal processing capabilities and biologically plausible dynamics, offer a natural platform for unsupervised representation learning. However, current unsupervised SNNs predominantly employ shallow architectures or localized plasticity rules, limiting their ability to model long-range temporal dependencies and maintain temporal feature consistency. This results in semantically unstable representations, thereby impeding the development of deep unsupervised SNNs for large-scale temporal video data. We propose PredNext, which explicitly models temporal relationships through cross-view future Step Prediction and Clip Prediction. This plug-and-play module seamlessly integrates with diverse self-supervised objectives. We firstly establish standard benchmarks for SNN self-supervised learning on UCF101, HMDB51, and MiniKinetics, which are substantially larger than conventional DVS datasets. PredNext delivers significant performance improvements across different tasks and self-supervised methods. PredNext achieves performance comparable to ImageNet-pretrained supervised weights, through unsupervised training solely on UCF101. Additional experiments demonstrate that PredNext, distinct from forced consistency constraints, substantially improves temporal feature consistency while enhancing network generalization capabilities. This work provides a effective foundation for unsupervised deep SNNs on large-scale temporal video data.

NEMar 24
A Latency Coding Framework for Deep Spiking Neural Networks with Ultra-Low Latency

Yi Lu, Jianhao Ding, Zhaofei Yu

Spiking neural networks (SNNs) offer a biologically inspired computing paradigm with significant potential for energy-efficient neural processing. Among neural coding schemes of SNNs, Time-To-First-Spike (TTFS) coding, which encodes information through the precise timing of a neuron's first spike, provides exceptional energy efficiency and biological plausibility. Despite its theoretical advantages, existing TTFS models lack efficient training methods, suffering from high inference latency and limited performance. In this work, we present a comprehensive framework, which enables the efficient training of deep TTFS-coded SNNs by employing backpropagation throuh time (BPTT) algorithm. We name the generalized TTFS coding method in our framework as latency coding. The framework includes: (1) a latency encoding (LE) module with feature extraction and straight-through estimators to address severe information loss in direct intensity-to-latency mapping and ensure smooth gradient flow; (2) relaxation of the strict single-spike constraint of traditional TTFS, allowing neurons of intermediate layers to fire multiple times to mitigating gradient vanishing in deep networks; (3) a temporal adaptive decision (TAD) loss function that dynamically weights supervision signals based on sample-dependent confidence, resolving the incompatibility between latency coding and standard cross-entropy loss. Experimental results demonstrate that our method achieves state-of-the-art accuracy in comparison to existing TTFS-coded SNNs with ultra-low inference latency and superior energy efficiency. The framework also demonstrates improved robustness against input corruptions. Our study investigates the characteristics and potential of latency coding in scenarios demanding rapid response, providing valuable insights for further exploiting the temporal learning capabilities of SNNs.

CVFeb 28, 2025Code
Towards High-performance Spiking Transformers from ANN to SNN Conversion

Zihan Huang, Xinyu Shi, Zecheng Hao et al. · pku

Spiking neural networks (SNNs) show great potential due to their energy efficiency, fast processing capabilities, and robustness. There are two main approaches to constructing SNNs. Direct training methods require much memory, while conversion methods offer a simpler and more efficient option. However, current conversion methods mainly focus on converting convolutional neural networks (CNNs) to SNNs. Converting Transformers to SNN is challenging because of the presence of non-linear modules. In this paper, we propose an Expectation Compensation Module to preserve the accuracy of the conversion. The core idea is to use information from the previous T time-steps to calculate the expected output at time-step T. We also propose a Multi-Threshold Neuron and the corresponding Parallel Parameter normalization to address the challenge of large time steps needed for high accuracy, aiming to reduce network latency and power consumption. Our experimental results demonstrate that our approach achieves state-of-the-art performance. For example, we achieve a top-1 accuracy of 88.60\% with only a 1\% loss in accuracy using 4 time steps while consuming only 35\% of the original power of the Transformer. To our knowledge, this is the first successful Artificial Neural Network (ANN) to SNN conversion for Spiking Transformers that achieves high accuracy, low latency, and low power consumption on complex datasets. The source codes of the proposed method are available at https://github.com/h-z-h-cell/Transformer-to-SNN-ECMT.

NEFeb 1, 2024Code
LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold Model

Zecheng Hao, Xinyu Shi, Yujia Liu et al. · pku

Compared to traditional Artificial Neural Network (ANN), Spiking Neural Network (SNN) has garnered widespread academic interest for its intrinsic ability to transmit information in a more energy-efficient manner. However, despite previous efforts to optimize the learning algorithm of SNNs through various methods, SNNs still lag behind ANNs in terms of performance. The recently proposed multi-threshold model provides more possibilities for further enhancing the learning capability of SNNs. In this paper, we rigorously analyze the relationship among the multi-threshold model, vanilla spiking model and quantized ANNs from a mathematical perspective, then propose a novel LM-HT model, which is an equidistant multi-threshold model that can dynamically regulate the global input current and membrane potential leakage on the time dimension. The LM-HT model can also be transformed into a vanilla single threshold model through reparameterization, thereby achieving more flexible hardware deployment. In addition, we note that the LM-HT model can seamlessly integrate with ANN-SNN Conversion framework under special initialization. This novel hybrid learning framework can effectively improve the relatively poor performance of converted SNNs under low time latency. Extensive experimental results have demonstrated that our model can outperform previous state-of-the-art works on various types of datasets, which promote SNNs to achieve a brand-new level of performance comparable to quantized ANNs. Code is available at https://github.com/hzc1208/LMHT_SNN.

CVMar 1, 2025Code
Differential Coding for Training-Free ANN-to-SNN Conversion

Zihan Huang, Wei Fang, Tong Bu et al. · pku

Spiking Neural Networks (SNNs) exhibit significant potential due to their low energy consumption. Converting Artificial Neural Networks (ANNs) to SNNs is an efficient way to achieve high-performance SNNs. However, many conversion methods are based on rate coding, which requires numerous spikes and longer time-steps compared to directly trained SNNs, leading to increased energy consumption and latency. This article introduces differential coding for ANN-to-SNN conversion, a novel coding scheme that reduces spike counts and energy consumption by transmitting changes in rate information rather than rates directly, and explores its application across various layers. Additionally, the threshold iteration method is proposed to optimize thresholds based on activation distribution when converting Rectified Linear Units (ReLUs) to spiking neurons. Experimental results on various Convolutional Neural Networks (CNNs) and Transformers demonstrate that the proposed differential coding significantly improves accuracy while reducing energy consumption, particularly when combined with the threshold iteration method, achieving state-of-the-art performance. The source codes of the proposed method are available at https://github.com/h-z-h-cell/ANN-to-SNN-DCGS.

CVNov 15, 2024Code
USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Kang Chen, Jiyuan Zhang, Zecheng Hao et al. · pku

Spike cameras, as an innovative neuromorphic camera that captures scenes with the 0-1 bit stream at 40 kHz, are increasingly employed for the 3D reconstruction task via Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS). Previous spike-based 3D reconstruction approaches often employ a casecased pipeline: starting with high-quality image reconstruction from spike streams based on established spike-to-image reconstruction algorithms, then progressing to camera pose estimation and 3D reconstruction. However, this cascaded approach suffers from substantial cumulative errors, where quality limitations of initial image reconstructions negatively impact pose estimation, ultimately degrading the fidelity of the 3D reconstruction. To address these issues, we propose a synergistic optimization framework, \textbf{USP-Gaussian}, that unifies spike-based image reconstruction, pose correction, and Gaussian splatting into an end-to-end framework. Leveraging the multi-view consistency afforded by 3DGS and the motion capture capability of the spike camera, our framework enables a joint iterative optimization that seamlessly integrates information between the spike-to-image network and 3DGS. Experiments on synthetic datasets with accurate poses demonstrate that our method surpasses previous approaches by effectively eliminating cascading errors. Moreover, we integrate pose optimization to achieve robust 3D reconstruction in real-world scenarios with inaccurate initial poses, outperforming alternative methods by effectively reducing noise and preserving fine texture details. Our code, data and trained models will be available at https://github.com/chenkang455/USP-Gaussian.

NEJan 29
Error Amplification Limits ANN-to-SNN Conversion in Continuous Control

Zijie Xu, Zihan Huang, Yiting Dong et al.

Spiking Neural Networks (SNNs) can achieve competitive performance by converting already existing well-trained Artificial Neural Networks (ANNs), avoiding further costly training. This property is particularly attractive in Reinforcement Learning (RL), where training through environment interaction is expensive and potentially unsafe. However, existing conversion methods perform poorly in continuous control, where suitable baselines are largely absent. We identify error amplification as the key cause: small action approximation errors become temporally correlated across decision steps, inducing cumulative state distribution shift and severe performance degradation. To address this issue, we propose Cross-Step Residual Potential Initialization (CRPI), a lightweight training-free mechanism that carries over residual membrane potentials across decision steps to suppress temporally correlated errors. Experiments on continuous control benchmarks with both vector and visual observations demonstrate that CRPI can be integrated into existing conversion pipelines and substantially recovers lost performance. Our results highlight continuous control as a critical and challenging benchmark for ANN-to-SNN conversion, where small errors can be strongly amplified and impact performance.

CVMay 1, 2025Code
SOTA: Spike-Navigated Optimal TrAnsport Saliency Region Detection in Composite-bias Videos

Wenxuan Liu, Yao Deng, Kang Chen et al.

Existing saliency detection methods struggle in real-world scenarios due to motion blur and occlusions. In contrast, spike cameras, with their high temporal resolution, significantly enhance visual saliency maps. However, the composite noise inherent to spike camera imaging introduces discontinuities in saliency detection. Low-quality samples further distort model predictions, leading to saliency bias. To address these challenges, we propose Spike-navigated Optimal TrAnsport Saliency Region Detection (SOTA), a framework that leverages the strengths of spike cameras while mitigating biases in both spatial and temporal dimensions. Our method introduces Spike-based Micro-debias (SM) to capture subtle frame-to-frame variations and preserve critical details, even under minimal scene or lighting changes. Additionally, Spike-based Global-debias (SG) refines predictions by reducing inconsistencies across diverse conditions. Extensive experiments on real and synthetic datasets demonstrate that SOTA outperforms existing methods by eliminating composite noise bias. Our code and dataset will be released at https://github.com/lwxfight/sota.

CVApr 26, 2025Code
Spike Imaging Velocimetry: Dense Motion Estimation of Fluids Using Spike Cameras

Yunzhong Zhang, Bo Xiong, You Zhou et al.

The need for accurate and non-intrusive flow measurement methods has led to the widespread adoption of Particle Image Velocimetry (PIV), a powerful diagnostic tool in fluid motion estimation. This study investigates the tremendous potential of spike cameras (a type of ultra-high-speed, high-dynamic-range camera) in PIV. We propose a deep learning framework, Spike Imaging Velocimetry (SIV), designed specifically for highly turbulent and intricate flow fields. To aggregate motion features from the spike stream while minimizing information loss, we incorporate a Detail-Preserving Hierarchical Transform (DPHT) module. Additionally, we introduce a Graph Encoder (GE) to extract contextual features from highly complex fluid flows. Furthermore, we present a spike-based PIV dataset, Particle Scenes with Spike and Displacement (PSSD), which provides labeled data for three challenging fluid dynamics scenarios. Our proposed method achieves superior performance compared to existing baseline methods on PSSD. The datasets and our implementation of SIV are open-sourced in the supplementary materials.

NEMar 20, 2025Code
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition

Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.

Spiking Neural Networks (SNNs) based on Transformers have garnered significant attention due to their superior performance and high energy efficiency. However, the spiking attention modules of most existing Transformer-based SNNs are adapted from those of analog Transformers, failing to fully address the issue of over-allocating attention to irrelevant contexts. To fix this fundamental yet overlooked issue, we propose a Lateral Inhibition-inspired Spiking Transformer (SpiLiFormer). It emulates the brain's lateral inhibition mechanism, guiding the model to enhance attention to relevant tokens while suppressing attention to irrelevant ones. Our model achieves state-of-the-art (SOTA) performance across multiple datasets, including CIFAR-10 (+0.45%), CIFAR-100 (+0.48%), CIFAR10-DVS (+2.70%), N-Caltech101 (+1.94%), and ImageNet-1K (+1.6%). Notably, on the ImageNet-1K dataset, SpiLiFormer (69.9M parameters, 4 time steps, 384 resolution) outperforms E-SpikeFormer (173.0M parameters, 8 time steps, 384 resolution), a SOTA spiking Transformer, by 0.46% using only 39% of the parameters and half the time steps. The code and model checkpoints are publicly available at https://github.com/KirinZheng/SpiLiFormer.

LGMay 10
Uncertainty-Aware Token Importance Estimation in Spiking Transformers

Wenxuan Liu, Zecheng Hao, Tong Bu et al.

Spiking transformers have shown strong potential for neuromorphic vision, yet their token processing across multiple spiking steps still introduces substantial redundancy and inference cost. Existing token reduction methods mainly rely on response based cues, such as activation magnitude, firing statistics, or feature similarity. Although effective, these criteria do not explicitly characterize token importance from the perspective of temporally evolving class evidence. In spiking transformers, token representations are progressively formed across multiple spiking steps rather than determined at a single instant, suggesting that token importance should be evaluated not only by instantaneous responses but also by temporal uncertainty patterns. Our key observation is that tokens exhibit heterogeneous uncertainty trajectories over time, and that their temporally aggregated uncertainty statistics provide an effective cue for distinguishing informative tokens from redundant ones. Motivated by this, we propose Uncert, a training free and plug and play token importance estimation framework for spiking transformers. Specifically, Uncert models token wise class evidence with a Dirichlet distribution and summarizes each token temporal uncertainty using its mean and fluctuation across spiking steps, yielding an uncertainty aware importance score for token reduction during inference. Experiments on both static and neuromorphic benchmarks show that Uncert achieves favorable accuracy and efficiency tradeoffs, with the most consistent gains observed under token pruning. Further analysis reveals a clear empirical connection between temporal uncertainty patterns and token contribution, offering new insights into token dynamics in spiking transformers.

LGOct 29, 2025Code
$π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Kang Chen, Zhihao Liu, Tonghe Zhang et al.

Vision-Language-Action (VLA) models enable robots to understand and perform complex tasks from multimodal input. Although recent work explores using reinforcement learning (RL) to automate the laborious data collection process in scaling supervised fine-tuning (SFT), applying large-scale RL to flow-based VLAs (e.g., $π_0$, $π_{0.5}$) remains challenging due to intractable action log-likelihoods from iterative denoising. We address this challenge with $π_{\text{RL}}$, an open-source framework for training flow-based VLAs in parallel simulation. $π_{\text{RL}}$ implements two RL algorithms: (1) {Flow-Noise} models the denoising process as a discrete-time MDP with a learnable noise network for exact log-likelihood computation. (2) {Flow-SDE} integrates denoising with agent-environment interaction, formulating a two-layer MDP that employs ODE-to-SDE conversion for efficient RL exploration. We evaluate $π_{\text{RL}}$ on LIBERO and ManiSkill benchmarks. On LIBERO, $π_{\text{RL}}$ boosts few-shot SFT models $π_0$ and $π_{0.5}$ from 57.6% to 97.6% and from 77.1% to 98.3%, respectively. In ManiSkill, we train $π_{\text{RL}}$ in 320 parallel environments, improving $π_0$ from 41.6% to 85.7% and $π_{0.5}$ from 40.0% to 84.8% across 4352 pick-and-place tasks, demonstrating scalable multitask RL under heterogeneous simulation. Overall, $π_{\text{RL}}$ achieves significant performance gains and stronger generalization over SFT-models, validating the effectiveness of online RL for flow-based VLAs.

NEAug 1, 2025Code
STF: Shallow-Level Temporal Feedback to Enhance Spiking Transformers

Zeqi Zheng, Zizheng Zhu, Yingchao Yu et al.

Transformer-based Spiking Neural Networks (SNNs) suffer from a great performance gap compared to floating-point \mbox{Artificial} Neural Networks (ANNs) due to the binary nature of spike trains. Recent efforts have introduced deep-level feedback loops to transmit high-level semantic information to narrow this gap. However, these designs often span \mbox{multiple} deep layers, resulting in costly feature transformations, higher parameter overhead, increased energy consumption, and longer inference latency. To address this issue, we propose Shallow-level Temporal Feedback (STF), a lightweight plug-and-play module for the encoding layer, which consists of Temporal-Spatial Position Embedding (TSPE) and Temporal Feedback (TF). Extensive experiments show that STF consistently improves performance across various Transformer-based SNN backbones on static datasets, including CIFAR-10, CIFAR-100, and ImageNet-1K, under different spike timestep settings. Further analysis reveals that STF enhances the diversity of spike patterns, which is key to performance gain. Moreover, evaluations on adversarial robustness and temporal sensitivity confirm that STF outperforms direct coding and its variants, highlighting its potential as a new spike encoding scheme for static scenarios. Our code will be released upon acceptance.

NEMay 30, 2025Code
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control

Zijie Xu, Tong Bu, Zecheng Hao et al. · pku

Spiking Neural Networks (SNNs) offer low-latency and energy-efficient decision making on neuromorphic hardware, making them attractive for Reinforcement Learning (RL) in resource-constrained edge devices. However, most RL algorithms for continuous control are designed for Artificial Neural Networks (ANNs), particularly the target network soft update mechanism, which conflicts with the discrete and non-differentiable dynamics of spiking neurons. We show that this mismatch destabilizes SNN training and degrades performance. To bridge the gap between discrete SNNs and continuous-control algorithms, we propose a novel proxy target framework. The proxy network introduces continuous and differentiable dynamics that enable smooth target updates, stabilizing the learning process. Since the proxy operates only during training, the deployed SNN remains fully energy-efficient with no additional inference overhead. Extensive experiments on continuous control benchmarks demonstrate that our framework consistently improves stability and achieves up to $32\%$ higher performance across various spiking neuron models. Notably, to the best of our knowledge, this is the first approach that enables SNNs with simple Leaky Integrate and Fire (LIF) neurons to surpass their ANN counterparts in continuous control. This work highlights the importance of SNN-tailored RL algorithms and paves the way for neuromorphic agents that combine high performance with low power consumption. Code is available at https://github.com/xuzijie32/Proxy-Target.

NEDec 18, 2024Code
Faster and Stronger: When ANN-SNN Conversion Meets Parallel Spiking Calculation

Zecheng Hao, Qichao Ma, Kang Chen et al. · pku

Spiking Neural Network (SNN), as a brain-inspired and energy-efficient network, is currently facing the pivotal challenge of exploring a suitable and efficient learning framework. The predominant training methodologies, namely Spatial-Temporal Back-propagation (STBP) and ANN-SNN Conversion, are encumbered by substantial training overhead or pronounced inference latency, which impedes the advancement of SNNs in scaling to larger networks and navigating intricate application domains. In this work, we propose a novel parallel conversion learning framework, which establishes a mathematical mapping relationship between each time-step of the parallel spiking neurons and the cumulative spike firing rate. We theoretically validate the lossless and sorting properties of the conversion process, as well as pointing out the optimal shifting distance for each step. Furthermore, by integrating the above framework with the distribution-aware error calibration technique, we can achieve efficient conversion towards more general activation functions or training-free circumstance. Extensive experiments have confirmed the significant performance advantages of our method for various conversion cases under ultra-low time latency. To our best knowledge, this is the first work which jointly utilizes parallel spiking calculation and ANN-SNN Conversion, providing a highly promising approach for SNN supervised training. Code is available at https://github.com/hzc1208/Parallel_Conversion.

CVMar 14, 2024Code
SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams

Kang Chen, Shiyan Chen, Jiyuan Zhang et al.

Reconstructing a sequence of sharp images from the blurry input is crucial for enhancing our insights into the captured scene and poses a significant challenge due to the limited temporal features embedded in the image. Spike cameras, sampling at rates up to 40,000 Hz, have proven effective in capturing motion features and beneficial for solving this ill-posed problem. Nonetheless, existing methods fall into the supervised learning paradigm, which suffers from notable performance degradation when applied to real-world scenarios that diverge from the synthetic training data domain. Moreover, the quality of reconstructed images is capped by the generated images based on motion analysis interpolation, which inherently differs from the actual scene, affecting the generalization ability of these methods in real high-speed scenarios. To address these challenges, we propose the first self-supervised framework for the task of spike-guided motion deblurring. Our approach begins with the formulation of a spike-guided deblurring model that explores the theoretical relationships among spike streams, blurry images, and their corresponding sharp sequences. We subsequently develop a self-supervised cascaded framework to alleviate the issues of spike noise and spatial-resolution mismatching encountered in the deblurring model. With knowledge distillation and re-blurring loss, we further design a lightweight deblur network to generate high-quality sequences with brightness and texture consistency with the original input. Quantitative and qualitative experiments conducted on our real-world and synthetic datasets with spikes validate the superior generalization of the proposed framework. Our code, data and trained models will be available at \url{https://github.com/chenkang455/S-SDM}.

CVSep 10, 2021Code
Spatio-Temporal Recurrent Networks for Event-Based Optical Flow Estimation

Ziluo Ding, Rui Zhao, Jiyuan Zhang et al.

Event camera has offered promising alternative for visual perception, especially in high speed and high dynamic range scenes. Recently, many deep learning methods have shown great success in providing promising solutions to many event-based problems, such as optical flow estimation. However, existing deep learning methods did not address the importance of temporal information well from the perspective of architecture design and cannot effectively extract spatio-temporal features. Another line of research that utilizes Spiking Neural Network suffers from training issues for deeper architecture.To address these points, a novel input representation is proposed that captures the events' temporal distribution for signal enhancement. Moreover, we introduce a spatio-temporal recurrent encoding-decoding neural network architecture for event-based optical flow estimation, which utilizes Convolutional Gated Recurrent Units to extract feature maps from a series of event images. Besides, our architecture allows some traditional frame-based core modules, such as correlation layer and iterative residual refine scheme, to be incorporated. The network is end-to-end trained with self-supervised learning on the Multi-Vehicle Stereo Event Camera dataset. We have shown that it outperforms all the existing state-of-the-art methods by a large margin. The code link is https://github.com/ruizhao26/STE-FlowNet.

NEMay 25, 2021Code
Optimal ANN-SNN Conversion for Fast and Accurate Inference in Deep Spiking Neural Networks

Jianhao Ding, Zhaofei Yu, Yonghong Tian et al.

Spiking Neural Networks (SNNs), as bio-inspired energy-efficient neural networks, have attracted great attentions from researchers and industry. The most efficient way to train deep SNNs is through ANN-SNN conversion. However, the conversion usually suffers from accuracy loss and long inference time, which impede the practical application of SNN. In this paper, we theoretically analyze ANN-SNN conversion and derive sufficient conditions of the optimal conversion. To better correlate ANN-SNN and get greater accuracy, we propose Rate Norm Layer to replace the ReLU activation function in source ANN training, enabling direct conversion from a trained ANN to an SNN. Moreover, we propose an optimal fit curve to quantify the fit between the activation value of source ANN and the actual firing rate of target SNN. We show that the inference time can be reduced by optimizing the upper bound of the fit curve in the revised ANN to achieve fast inference. Our theory can explain the existing work on fast reasoning and get better results. The experimental results show that the proposed method achieves near loss less conversion with VGG-16, PreActResNet-18, and deeper structures. Moreover, it can reach 8.6x faster reasoning performance under 0.265x energy consumption of the typical method. The code is available at https://github.com/DingJianhao/OptSNNConvertion-RNL-RIL.

NEMay 11, 2021Code
Pruning of Deep Spiking Neural Networks through Gradient Rewiring

Yanqi Chen, Zhaofei Yu, Wei Fang et al.

Spiking Neural Networks (SNNs) have been attached great importance due to their biological plausibility and high energy-efficiency on neuromorphic chips. As these chips are usually resource-constrained, the compression of SNNs is thus crucial along the road of practical use of SNNs. Most existing methods directly apply pruning approaches in artificial neural networks (ANNs) to SNNs, which ignore the difference between ANNs and SNNs, thus limiting the performance of the pruned SNNs. Besides, these methods are only suitable for shallow SNNs. In this paper, inspired by synaptogenesis and synapse elimination in the neural system, we propose gradient rewiring (Grad R), a joint learning algorithm of connectivity and weight for SNNs, that enables us to seamlessly optimize network structure without retraining. Our key innovation is to redefine the gradient to a new synaptic parameter, allowing better exploration of network structures by taking full advantage of the competition between pruning and regrowth of connections. The experimental results show that the proposed method achieves minimal loss of SNNs' performance on MNIST and CIFAR-10 dataset so far. Moreover, it reaches a $\sim$3.5% accuracy loss under unprecedented 0.73% connectivity, which reveals remarkable structure refining capability in SNNs. Our work suggests that there exists extremely high redundancy in deep SNNs. Our codes are available at https://github.com/Yanqi-Chen/Gradient-Rewiring.

NEFeb 8, 2021Code
Deep Residual Learning in Spiking Neural Networks

Wei Fang, Zhaofei Yu, Yanqi Chen et al.

Deep Spiking Neural Networks (SNNs) present optimization difficulties for gradient-based approaches due to discrete binary activation and complex spatial-temporal dynamics. Considering the huge success of ResNet in deep learning, it would be natural to train deep SNNs with residual learning. Previous Spiking ResNet mimics the standard residual block in ANNs and simply replaces ReLU activation layers with spiking neurons, which suffers the degradation problem and can hardly implement residual learning. In this paper, we propose the spike-element-wise (SEW) ResNet to realize residual learning in deep SNNs. We prove that the SEW ResNet can easily implement identity mapping and overcome the vanishing/exploding gradient problems of Spiking ResNet. We evaluate our SEW ResNet on ImageNet, DVS Gesture, and CIFAR10-DVS datasets, and show that SEW ResNet outperforms the state-of-the-art directly trained SNNs in both accuracy and time-steps. Moreover, SEW ResNet can achieve higher performance by simply adding more layers, providing a simple method to train deep SNNs. To our best knowledge, this is the first time that directly training deep SNNs with more than 100 layers becomes possible. Our codes are available at https://github.com/fangwei123456/Spike-Element-Wise-ResNet.

NEJul 11, 2020Code
Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks

Wei Fang, Zhaofei Yu, Yanqi Chen et al.

Spiking Neural Networks (SNNs) have attracted enormous research interest due to temporal information processing capability, low power consumption, and high biological plausibility. However, the formulation of efficient and high-performance learning algorithms for SNNs is still challenging. Most existing learning methods learn weights only, and require manual tuning of the membrane-related parameters that determine the dynamics of a single spiking neuron. These parameters are typically chosen to be the same for all neurons, which limits the diversity of neurons and thus the expressiveness of the resulting SNNs. In this paper, we take inspiration from the observation that membrane-related parameters are different across brain regions, and propose a training algorithm that is capable of learning not only the synaptic weights but also the membrane time constants of SNNs. We show that incorporating learnable membrane time constants can make the network less sensitive to initial values and can speed up learning. In addition, we reevaluate the pooling methods in SNNs and find that max-pooling will not lead to significant information loss and have the advantage of low computation cost and binary compatibility. We evaluate the proposed method for image classification tasks on both traditional static MNIST, Fashion-MNIST, CIFAR-10 datasets, and neuromorphic N-MNIST, CIFAR10-DVS, DVS128 Gesture datasets. The experiment results show that the proposed method outperforms the state-of-the-art accuracy on nearly all datasets, using fewer time-steps. Our codes are available at https://github.com/fangwei123456/Parametric-Leaky-Integrate-and-Fire-Spiking-Neuron.

NEMar 21, 2024
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

Xinyu Shi, Zecheng Hao, Zhaofei Yu · pku

The remarkable success of Vision Transformers in Artificial Neural Networks (ANNs) has led to a growing interest in incorporating the self-attention mechanism and transformer-based architecture into Spiking Neural Networks (SNNs). While existing methods propose spiking self-attention mechanisms that are compatible with SNNs, they lack reasonable scaling methods, and the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting local features. To address these challenges, we propose a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) with a reasonable scaling method. Based on DSSA, we propose a novel spiking Vision Transformer architecture called SpikingResformer, which combines the ResNet-based multi-stage architecture with our proposed DSSA to improve both performance and energy efficiency while reducing parameters. Experimental results show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption than other spiking Vision Transformer counterparts. Notably, our SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, which is the state-of-the-art result in the SNN field.

IVNov 3, 2023
INeAT: Iterative Neural Adaptive Tomography

Bo Xiong, Changqing Su, Zihan Lin et al.

Computed Tomography (CT) with its remarkable capability for three-dimensional imaging from multiple projections, enjoys a broad range of applications in clinical diagnosis, scientific observation, and industrial detection. Neural Adaptive Tomography (NeAT) is a recently proposed 3D rendering method based on neural radiance field for CT, and it demonstrates superior performance compared to traditional methods. However, it still faces challenges when dealing with the substantial perturbations and pose shifts encountered in CT scanning processes. Here, we propose a neural rendering method for CT reconstruction, named Iterative Neural Adaptive Tomography (INeAT), which incorporates iterative posture optimization to effectively counteract the influence of posture perturbations in data, particularly in cases involving significant posture variations. Through the implementation of a posture feedback optimization strategy, INeAT iteratively refines the posture corresponding to the input images based on the reconstructed 3D volume. We demonstrate that INeAT achieves artifact-suppressed and resolution-enhanced reconstruction in scenarios with significant pose disturbances. Furthermore, we show that our INeAT maintains comparable reconstruction performance to stable-state acquisitions even using data from unstable-state acquisitions, which significantly reduces the time required for CT scanning and relaxes the stringent requirements on imaging hardware systems, underscoring its immense potential for applications in short-time and low-cost CT technology.

NEApr 10
Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer

Zecheng Hao, Shenghao Xie, Kang Chen et al.

Spiking Neural Networks (SNNs) offer superior energy efficiency over Artificial Neural Networks (ANNs). However, they encounter significant deficiencies in training and inference metrics when applied to Spiking Vision Transformers (S-ViTs). Existing paradigms including ANN-SNN Conversion and Spatial-Temporal Backpropagation (STBP) suffer from inherent limitations, precluding concurrent optimization of memory, accuracy and energy consumption. To address these issues, we propose Ge$^\text{2}$mS-T, a novel architecture implementing grouped computation across temporal, spatial and network structure dimensions. Specifically, we introduce the Grouped-Exponential-Coding-based IF (ExpG-IF) model, enabling lossless conversion with constant training overhead and precise regulation for spike patterns. Additionally, we develop Group-wise Spiking Self-Attention (GW-SSA) to reduce computational complexity via multi-scale token grouping and multiplication-free operations within a hybrid attention-convolution framework. Experiments confirm that our method can achieve superior performance with ultra-high energy efficiency on challenging benchmarks. To our best knowledge, this is the first work to systematically establish multi-dimensional grouped computation for resolving the triad of memory overhead, learning capability and energy budget in S-ViTs.

CVJan 8, 2025
Rethinking High-speed Image Reconstruction Framework with Spike Camera

Kang Chen, Yajing Zheng, Tiejun Huang et al.

Spike cameras, as innovative neuromorphic devices, generate continuous spike streams to capture high-speed scenes with lower bandwidth and higher dynamic range than traditional RGB cameras. However, reconstructing high-quality images from the spike input under low-light conditions remains challenging. Conventional learning-based methods often rely on the synthetic dataset as the supervision for training. Still, these approaches falter when dealing with noisy spikes fired under the low-light environment, leading to further performance degradation in the real-world dataset. This phenomenon is primarily due to inadequate noise modelling and the domain gap between synthetic and real datasets, resulting in recovered images with unclear textures, excessive noise, and diminished brightness. To address these challenges, we introduce a novel spike-to-image reconstruction framework SpikeCLIP that goes beyond traditional training paradigms. Leveraging the CLIP model's powerful capability to align text and images, we incorporate the textual description of the captured scene and unpaired high-quality datasets as the supervision. Our experiments on real-world low-light datasets U-CALTECH and U-CIFAR demonstrate that SpikeCLIP significantly enhances texture details and the luminance balance of recovered images. Furthermore, the reconstructed images are well-aligned with the broader visual features needed for downstream tasks, ensuring more robust and versatile performance in challenging environments.

NEDec 24, 2023
Deep Pulse-Coupled Neural Networks

Zexiang Yi, Jing Lian, Yunliang Qi et al.

Spiking Neural Networks (SNNs) capture the information processing mechanism of the brain by taking advantage of spiking neurons, such as the Leaky Integrate-and-Fire (LIF) model neuron, which incorporates temporal dynamics and transmits information via discrete and asynchronous spikes. However, the simplified biological properties of LIF ignore the neuronal coupling and dendritic structure of real neurons, which limits the spatio-temporal dynamics of neurons and thus reduce the expressive power of the resulting SNNs. In this work, we leverage a more biologically plausible neural model with complex dynamics, i.e., a pulse-coupled neural network (PCNN), to improve the expressiveness and recognition performance of SNNs for vision tasks. The PCNN is a type of cortical model capable of emulating the complex neuronal activities in the primary visual cortex. We construct deep pulse-coupled neural networks (DPCNNs) by replacing commonly used LIF neurons in SNNs with PCNN neurons. The intra-coupling in existing PCNN models limits the coupling between neurons only within channels. To address this limitation, we propose inter-channel coupling, which allows neurons in different feature maps to interact with each other. Experimental results show that inter-channel coupling can efficiently boost performance with fewer neurons, synapses, and less training time compared to widening the networks. For instance, compared to the LIF-based SNN with wide VGG9, DPCNN with VGG9 uses only 50%, 53%, and 73% of neurons, synapses, and training time, respectively. Furthermore, we propose receptive field and time dependent batch normalization (RFTD-BN) to speed up the convergence and performance of DPCNNs.

NCJan 8, 2024
Deep Learning for Visual Neuroprosthesis

Peter Beech, Shanshan Jia, Zhaofei Yu et al.

The visual pathway involves complex networks of cells and regions which contribute to the encoding and processing of visual information. While some aspects of visual perception are understood, there are still many unanswered questions regarding the exact mechanisms of visual encoding and the organization of visual information along the pathway. This chapter discusses the importance of visual perception and the challenges associated with understanding how visual information is encoded and represented in the brain. Furthermore, this chapter introduces the concept of neuroprostheses: devices designed to enhance or replace bodily functions, and highlights the importance of constructing computational models of the visual pathway in the implementation of such devices. A number of such models, employing the use of deep learning models, are outlined, and their value to understanding visual coding and natural vision is discussed.

CVNov 19, 2025
Driving in Spikes: An Entropy-Guided Object Detector for Spike Cameras

Ziyan Liu, Qi Su, Lulu Tang et al.

Object detection in autonomous driving suffers from motion blur and saturation under fast motion and extreme lighting. Spike cameras, offer microsecond latency and ultra high dynamic range for object detection by using per pixel asynchronous integrate and fire. However, their sparse, discrete output cannot be processed by standard image-based detectors, posing a critical challenge for end to end spike stream detection. We propose EASD, an end to end spike camera detector with a dual branch design: a Temporal Based Texture plus Feature Fusion branch for global cross slice semantics, and an Entropy Selective Attention branch for object centric details. To close the data gap, we introduce DSEC Spike, the first driving oriented simulated spike detection benchmark.

CVOct 22, 2025
HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking

Yao Deng, Xian Zhong, Wenxuan Liu et al.

RGB cameras excel at capturing rich texture details with high spatial resolution, whereas event cameras offer exceptional temporal resolution and a high dynamic range (HDR). Leveraging their complementary strengths can substantially enhance object tracking under challenging conditions, such as high-speed motion, HDR environments, and dynamic background interference. However, a significant spatio-temporal asymmetry exists between these two modalities due to their fundamentally different imaging mechanisms, hindering effective multi-modal integration. To address this issue, we propose {Hierarchical Asymmetric Distillation} (HAD), a multi-modal knowledge distillation framework that explicitly models and mitigates spatio-temporal asymmetries. Specifically, HAD proposes a hierarchical alignment strategy that minimizes information loss while maintaining the student network's computational efficiency and parameter compactness. Extensive experiments demonstrate that HAD consistently outperforms state-of-the-art methods, and comprehensive ablation studies further validate the effectiveness and necessity of each designed component. The code will be released soon.

ROOct 12, 2025
SpikeGrasp: A Benchmark for 6-DoF Grasp Pose Detection from Stereo Spike Streams

Zhuoheng Gao, Jiyao Zhang, Zhiyong Xie et al.

Most robotic grasping systems rely on converting sensor data into explicit 3D point clouds, which is a computational step not found in biological intelligence. This paper explores a fundamentally different, neuro-inspired paradigm for 6-DoF grasp detection. We introduce SpikeGrasp, a framework that mimics the biological visuomotor pathway, processing raw, asynchronous events from stereo spike cameras, similarly to retinas, to directly infer grasp poses. Our model fuses these stereo spike streams and uses a recurrent spiking neural network, analogous to high-level visual processing, to iteratively refine grasp hypotheses without ever reconstructing a point cloud. To validate this approach, we built a large-scale synthetic benchmark dataset. Experiments show that SpikeGrasp surpasses traditional point-cloud-based baselines, especially in cluttered and textureless scenes, and demonstrates remarkable data efficiency. By establishing the viability of this end-to-end, neuro-inspired approach, SpikeGrasp paves the way for future systems capable of the fluid and efficient manipulation seen in nature, particularly for dynamic objects.

CVMay 26, 2025
SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams

Zhuoheng Gao, Yihao Li, Jiyao Zhang et al. · pku

Conventional frame-based cameras often struggle with stereo depth estimation in rapidly changing scenes. In contrast, bio-inspired spike cameras emit asynchronous events at microsecond-level resolution, providing an alternative sensing modality. However, existing methods lack specialized stereo algorithms and benchmarks tailored to the spike data. To address this gap, we propose SpikeStereoNet, a brain-inspired framework and the first to estimate stereo depth directly from raw spike streams. The model fuses raw spike streams from two viewpoints and iteratively refines depth estimation through a recurrent spiking neural network (RSNN) update module. To benchmark our approach, we introduce a large-scale synthetic spike stream dataset and a real-world stereo spike dataset with dense depth annotations. SpikeStereoNet outperforms existing methods on both datasets by leveraging spike streams' ability to capture subtle edges and intensity shifts in challenging regions such as textureless surfaces and extreme lighting conditions. Furthermore, our framework exhibits strong data efficiency, maintaining high accuracy even with substantially reduced training data. The source code and datasets will be publicly available.

CVMay 19, 2025
SPKLIP: Aligning Spike Video Streams with Natural Language

Yongchang Gao, Meiling Jin, Zhaofei Yu et al.

Spike cameras offer unique sensing capabilities but their sparse, asynchronous output challenges semantic understanding, especially for Spike Video-Language Alignment (Spike-VLA) where models like CLIP underperform due to modality mismatch. We introduce SPKLIP, the first architecture specifically for Spike-VLA. SPKLIP employs a hierarchical spike feature extractor that adaptively models multi-scale temporal dynamics in event streams, and uses spike-text contrastive learning to directly align spike video with language, enabling effective few-shot learning. A full-spiking visual encoder variant, integrating SNN components into our pipeline, demonstrates enhanced energy efficiency. Experiments show state-of-the-art performance on benchmark spike datasets and strong few-shot generalization on a newly contributed real-world dataset. SPKLIP's energy efficiency highlights its potential for neuromorphic deployment, advancing event-based multimodal research. The source code and dataset are available at [link removed for anonymity].

NEMay 17, 2025
TDFormer: A Top-Down Attention-Controlled Spiking Transformer

Zizheng Zhu, Yingchao Yu, Zeqi Zheng et al.

Traditional spiking neural networks (SNNs) can be viewed as a combination of multiple subnetworks with each running for one time step, where the parameters are shared, and the membrane potential serves as the only information link between them. However, the implicit nature of the membrane potential limits its ability to effectively represent temporal information. As a result, each time step cannot fully leverage information from previous time steps, seriously limiting the model's performance. Inspired by the top-down mechanism in the brain, we introduce TDFormer, a novel model with a top-down feedback structure that functions hierarchically and leverages high-order representations from earlier time steps to modulate the processing of low-order information at later stages. The feedback structure plays a role from two perspectives: 1) During forward propagation, our model increases the mutual information across time steps, indicating that richer temporal information is being transmitted and integrated in different time steps. 2) During backward propagation, we theoretically prove that the feedback structure alleviates the problem of vanishing gradients along the time dimension. We find that these mechanisms together significantly and consistently improve the model performance on multiple datasets. In particular, our model achieves state-of-the-art performance on ImageNet with an accuracy of 86.83%.

CVApr 7, 2025
Inter-event Interval Microscopy for Event Cameras

Changqing Su, Yanqin Chen, Zihan Lin et al.

Event cameras, an innovative bio-inspired sensor, differ from traditional cameras by sensing changes in intensity rather than directly perceiving intensity and recording these variations as a continuous stream of "events". The intensity reconstruction from these sparse events has long been a challenging problem. Previous approaches mainly focused on transforming motion-induced events into videos or achieving intensity imaging for static scenes by integrating modulation devices at the event camera acquisition end. In this paper, for the first time, we achieve event-to-intensity conversion using a static event camera for both static and dynamic scenes in fluorescence microscopy. Unlike conventional methods that primarily rely on event integration, the proposed Inter-event Interval Microscopy (IEIM) quantifies the time interval between consecutive events at each pixel. With a fixed threshold in the event camera, the time interval can precisely represent the intensity. At the hardware level, the proposed IEIM integrates a pulse light modulation device within a microscope equipped with an event camera, termed Pulse Modulation-based Event-driven Fluorescence Microscopy. Additionally, we have collected IEIMat dataset under various scenes including high dynamic range and high-speed scenarios. Experimental results on the IEIMat dataset demonstrate that the proposed IEIM achieves superior spatial and temporal resolution, as well as a higher dynamic range, with lower bandwidth compared to other methods. The code and the IEIMat dataset will be made publicly available.

CVMar 26, 2025
SpikeDerain: Unveiling Clear Videos from Rainy Sequences Using Color Spike Streams

Hanwen Liang, Xian Zhong, Wenxuan Liu et al.

Restoring clear frames from rainy videos presents a significant challenge due to the rapid motion of rain streaks. Traditional frame-based visual sensors, which capture scene content synchronously, struggle to capture the fast-moving details of rain accurately. In recent years, neuromorphic sensors have introduced a new paradigm for dynamic scene perception, offering microsecond temporal resolution and high dynamic range. However, existing multimodal methods that fuse event streams with RGB images face difficulties in handling the complex spatiotemporal interference of raindrops in real scenes, primarily due to hardware synchronization errors and computational redundancy. In this paper, we propose a Color Spike Stream Deraining Network (SpikeDerain), capable of reconstructing spike streams of dynamic scenes and accurately removing rain streaks. To address the challenges of data scarcity in real continuous rainfall scenes, we design a physically interpretable rain streak synthesis model that generates parameterized continuous rain patterns based on arbitrary background images. Experimental results demonstrate that the network, trained with this synthetic data, remains highly robust even under extreme rainfall conditions. These findings highlight the effectiveness and robustness of our method across varying rainfall levels and datasets, setting new standards for video deraining tasks. The code will be released soon.