NEApr 25, 2023Code
Parallel Spiking Neurons with High Efficiency and Ability to Learn Long-term DependenciesWei Fang, Zhaofei Yu, Zhaokun Zhou et al.
Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated serially and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics can be reformulated in a non-iterative form and parallelized. By rewriting neuronal dynamics without reset to a general formulation, we propose the Parallel Spiking Neuron (PSN), which generates hidden states that are independent of their predecessors, resulting in parallelizable neuronal dynamics and extremely high simulation speed. The weights of inputs in the PSN are fully connected, which maximizes the utilization of temporal information. To avoid the use of future inputs for step-by-step inference, the weights of the PSN can be masked, resulting in the masked PSN. By sharing weights across time-steps based on the masked PSN, the sliding PSN is proposed to handle sequences of varying lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To the best of our knowledge, this is the first study about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning research. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}.
LGFeb 25, 2023Code
A Unified Framework for Soft Threshold PruningYanqi Chen, Zhengyu Ma, Wei Fang et al.
Soft threshold pruning is among the cutting-edge pruning methods with state-of-the-art performance. However, previous methods either perform aimless searching on the threshold scheduler or simply set the threshold trainable, lacking theoretical explanation from a unified perspective. In this work, we reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA), a classic method from the fields of sparse recovery and compressed sensing. Under this theoretical framework, all threshold tuning strategies proposed in previous studies of soft threshold pruning are concluded as different styles of tuning $L_1$-regularization term. We further derive an optimal threshold scheduler through an in-depth study of threshold scheduling based on our framework. This scheduler keeps $L_1$-regularization coefficient stable, implying a time-invariant objective function from the perspective of optimization. In principle, the derived pruning algorithm could sparsify any mathematical model trained via SGD. We conduct extensive experiments and verify its state-of-the-art performance on both Artificial Neural Networks (ResNet-50 and MobileNet-V1) and Spiking Neural Networks (SEW ResNet-18) on ImageNet datasets. On the basis of this framework, we derive a family of pruning methods, including sparsify-during-training, early pruning, and pruning at initialization. The code is available at https://github.com/Yanqi-Chen/LATS.
NEApr 25, 2023
Binary stochasticity enabled highly efficient neuromorphic deep learning achieves better-than-software accuracyYang Li, Wei Wang, Ming Wang et al.
Deep learning needs high-precision handling of forwarding signals, backpropagating errors, and updating weights. This is inherently required by the learning algorithm since the gradient descent learning rule relies on the chain product of partial derivatives. However, it is challenging to implement deep learning in hardware systems that use noisy analog memristors as artificial synapses, as well as not being biologically plausible. Memristor-based implementations generally result in an excessive cost of neuronal circuits and stringent demands for idealized synaptic devices. Here, we demonstrate that the requirement for high precision is not necessary and that more efficient deep learning can be achieved when this requirement is lifted. We propose a binary stochastic learning algorithm that modifies all elementary neural network operations, by introducing (i) stochastic binarization of both the forwarding signals and the activation function derivatives, (ii) signed binarization of the backpropagating errors, and (iii) step-wised weight updates. Through an extensive hybrid approach of software simulation and hardware experiments, we find that binary stochastic deep learning systems can provide better performance than the software-based benchmarks using the high-precision learning algorithm. Also, the binary stochastic algorithm strongly simplifies the neural network operations in hardware, resulting in an improvement of the energy efficiency for the multiply-and-accumulate operations by more than three orders of magnitudes.
NEApr 24, 2023Code
Spikingformer: A Key Foundation Model for Spiking Neural NetworksChenlin Zhou, Liutao Yu, Zhaokun Zhou et al.
Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks, due to their event-driven spiking computation. However, some foundation SNN backbones (including Spikformer and SEW ResNet) suffer from non-spike computations (integer-float multiplications) caused by the structure of their residual connections. These non-spike computations increase SNNs' power consumption and make them unsuitable for deployment on mainstream neuromorphic hardware. In this paper, we analyze the spike-driven behavior of the residual connection methods in SNNs. We then present Spikingformer, a novel spiking transformer backbone that merges the MS Residual connection with Self-Attention in a biologically plausible way to address the non-spike computation challenge in Spikformer while maintaining global modeling capabilities. We evaluate Spikingformer across 13 datasets spanning large static images, neuromorphic data, and natural language tasks, and demonstrate the effectiveness and universality of Spikingformer, setting a vital benchmark for spiking neural networks. In addition, with the spike-driven features and global modeling capabilities, Spikingformer is expected to become a more efficient general-purpose SNN backbone towards energy-efficient artificial intelligence. Code: https://github.com/TheBrainLab/Spikingformer
CVMar 1, 2023
The style transformer with common knowledge optimization for image-text retrievalWenrui Li, Zhengyu Ma, Jinqiao Shi et al.
Image-text retrieval which associates different modalities has drawn broad attention due to its excellent research value and broad real-world application. However, most of the existing methods haven't taken the high-level semantic relationships ("style embedding") and common knowledge from multi-modalities into full consideration. To this end, we introduce a novel style transformer network with common knowledge optimization (CKSTN) for image-text retrieval. The main module is the common knowledge adaptor (CKA) with both the style embedding extractor (SEE) and the common knowledge optimization (CKO) modules. Specifically, the SEE uses the sequential update strategy to effectively connect the features of different stages in SEE. The CKO module is introduced to dynamically capture the latent concepts of common knowledge from different modalities. Besides, to get generalized temporal common knowledge, we propose a sequential update strategy to effectively integrate the features of different layers in SEE with previous common feature units. CKSTN demonstrates the superiorities of the state-of-the-art methods in image-text retrieval on MSCOCO and Flickr30K datasets. Moreover, CKSTN is constructed based on the lightweight transformer which is more convenient and practical for the application of real scenes, due to the better performance and lower parameters.
CVMay 16, 2024Code
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object DetectionTianhe Ren, Qing Jiang, Shilong Liu et al.
This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model optimized for faster speed demanded in many applications requiring edge deployment. The Grounding DINO 1.5 Pro model advances its predecessor by scaling up the model architecture, integrating an enhanced vision backbone, and expanding the training dataset to over 20 million images with grounding annotations, thereby achieving a richer semantic understanding. The Grounding DINO 1.5 Edge model, while designed for efficiency with reduced feature scales, maintains robust detection capabilities by being trained on the same comprehensive dataset. Empirical results demonstrate the effectiveness of Grounding DINO 1.5, with the Grounding DINO 1.5 Pro model attaining a 54.3 AP on the COCO detection benchmark and a 55.7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection. Furthermore, the Grounding DINO 1.5 Edge model, when optimized with TensorRT, achieves a speed of 75.2 FPS while attaining a zero-shot performance of 36.2 AP on the LVIS-minival benchmark, making it more suitable for edge computing scenarios. Model examples and demos with API will be released at https://github.com/IDEA-Research/Grounding-DINO-1.5-API
NEMar 25, 2024Code
QKFormer: Hierarchical Spiking Transformer using Q-K AttentionChenlin Zhou, Han Zhang, Zhaokun Zhou et al.
Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer
NEApr 14
Adaptive Spiking Neurons for Vision and Language ModelingChenlin Zhou, Sihang Guo, Jiaqi Wang et al.
Regarded as the third generation of neural networks, Spiking Neural Networks (SNNs) have garnered significant traction due to their biological plausibility and energy efficiency. Recent advancements in large models necessitate spiking neurons capable of high performance, adaptability, and training efficiency. In this work, we first propose a novel functional perspective that provides general guidance for designing the new generation of spiking neurons. Following the insightful guidelines, we propose the Adaptive Spiking Neuron (ASN), which incorporates trainable parameters to learn membrane potential dynamics and enable adaptive firing. ASN adopts an integer training and spike inference paradigm, facilitating efficient SNN training. To further enhance robustness, we propose a specialized variant of ASN, the Normalized Adaptive Spiking Neuron (NASN), which integrates normalization to stabilize training. We evaluate our neuron model on 19 datasets spanning five distinct tasks in both vision and language modalities, demonstrating the effectiveness and versatility of the ASN family. Our ASN family is expected to become the new generation of general-purpose spiking neurons.
NEApr 13
Winner-Take-All Spiking Transformer for Language ModelingChenlin Zhou, Sihang Guo, Jiaqi Wang et al.
Spiking Transformers, which combine the scalability of Transformers with the sparse, energy-efficient property of Spiking Neural Networks (SNNs), have achieved impressive results in neuromorphic and vision tasks and attracted increasing attention. However, existing directly trained spiking transformers primarily focus on vision tasks. For language modeling with spiking transformer, convergence relies heavily on softmax-based spiking self-attention, which incurs high energy costs and poses challenges for neuromorphic deployment. To address this issue, we introduce Winner-Take-All (WTA) mechanisms into spiking transformers and propose two novel softmax-free, spike-driven self-attention modules: WTA Spiking Self-Attention (WSSA) and Causal WTA Spiking Self-Attention (CWSSA). Based on them, we design WTA-based Encoder-only Spiking Transformer (WE-Spikingformer) for masked language modeling and WTA-based Decoder-only Spiking Transformer (WD-Spikingformer) for causal language modeling, systematically exploring softmax-free, spiking-driven Transformer architectures trained end-to-end for natural language processing tasks. Extensive experiments on 16 datasets spanning natural language understanding, question-answering tasks, and commonsense reasoning tasks validate the effectiveness of our approach and highlight the promise of spiking transformers for general language modeling and energy-efficient artificial intelligence.
SDNov 11, 2025
SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command RecognitionJiaqi Wang, Liutao Yu, Xiongri Shen et al.
Spiking neural networks (SNNs) offer a promising path toward energy-efficient speech command recognition (SCR) by leveraging their event-driven processing paradigm. However, existing SNN-based SCR methods often struggle to capture rich temporal dependencies and contextual information from speech due to limited temporal modeling and binary spike-based representations. To address these challenges, we first introduce the multi-view spiking temporal-aware self-attention (MSTASA) module, which combines effective spiking temporal-aware attention with a multi-view learning framework to model complementary temporal dependencies in speech commands. Building on MSTASA, we further propose SpikCommander, a fully spike-driven transformer architecture that integrates MSTASA with a spiking contextual refinement channel MLP (SCR-MLP) to jointly enhance temporal context modeling and channel-wise feature integration. We evaluate our method on three benchmark datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC), and the Google Speech Commands V2 (GSC). Extensive experiments demonstrate that SpikCommander consistently outperforms state-of-the-art (SOTA) SNN approaches with fewer parameters under comparable time steps, highlighting its effectiveness and efficiency for robust speech command recognition.
LGFeb 11
Hierarchical Zero-Order Optimization for Deep Neural NetworksSansheng Cao, Zhengyu Ma, Yonghong Tian
Zeroth-order (ZO) optimization has long been favored for its biological plausibility and its capacity to handle non-differentiable objectives, yet its computational complexity has historically limited its application in deep neural networks. Challenging the conventional paradigm that gradients propagate layer-by-layer, we propose Hierarchical Zeroth-Order (HZO) optimization, a novel divide-and-conquer strategy that decomposes the depth dimension of the network. We prove that HZO reduces the query complexity from $O(ML^2)$ to $O(ML \log L)$ for a network of width $M$ and depth $L$, representing a significant leap over existing ZO methodologies. Furthermore, we provide a detailed error analysis showing that HZO maintains numerical stability by operating near the unitary limit ($L_{lip} \approx 1$). Extensive evaluations on CIFAR-10 and ImageNet demonstrate that HZO achieves competitive accuracy compared to backpropagation.
LGAug 7, 2025Code
S$^2$M-Former: Spiking Symmetric Mixing Branchformer for Brain Auditory Attention DetectionJiaqi Wang, Zhengyu Ma, Xiongri Shen et al.
Auditory attention detection (AAD) aims to decode listeners' focus in complex auditory environments from electroencephalography (EEG) recordings, which is crucial for developing neuro-steered hearing devices. Despite recent advancements, EEG-based AAD remains hindered by the absence of synergistic frameworks that can fully leverage complementary EEG features under energy-efficiency constraints. We propose S$^2$M-Former, a novel spiking symmetric mixing framework to address this limitation through two key innovations: i) Presenting a spike-driven symmetric architecture composed of parallel spatial and frequency branches with mirrored modular design, leveraging biologically plausible token-channel mixers to enhance complementary learning across branches; ii) Introducing lightweight 1D token sequences to replace conventional 3D operations, reducing parameters by 14.7$\times$. The brain-inspired spiking architecture further reduces power consumption, achieving a 5.8$\times$ energy reduction compared to recent ANN methods, while also surpassing existing SNN baselines in terms of parameter efficiency and performance. Comprehensive experiments on three AAD benchmarks (KUL, DTU and AV-GC-AAD) across three settings (within-trial, cross-trial and cross-subject) demonstrate that S$^2$M-Former achieves comparable state-of-the-art (SOTA) decoding accuracy, making it a promising low-power, high-performance solution for AAD tasks. Code is available at https://github.com/JackieWang9811/S2M-Former.
CVNov 21, 2024
DINO-X: A Unified Vision Model for Open-World Object Detection and UnderstandingTianhe Ren, Yihao Chen, Qing Jiang et al.
In this paper, we introduce DINO-X, which is a unified object-centric vision model developed by IDEA Research with the best open-world object detection performance to date. DINO-X employs the same Transformer-based encoder-decoder architecture as Grounding DINO 1.5 to pursue an object-level representation for open-world object understanding. To make long-tailed object detection easy, DINO-X extends its input options to support text prompt, visual prompt, and customized prompt. With such flexible prompt options, we develop a universal object prompt to support prompt-free open-world detection, making it possible to detect anything in an image without requiring users to provide any prompt. To enhance the model's core grounding capability, we have constructed a large-scale dataset with over 100 million high-quality grounding samples, referred to as Grounding-100M, for advancing the model's open-vocabulary detection performance. Pre-training on such a large-scale grounding dataset leads to a foundational object-level representation, which enables DINO-X to integrate multiple perception heads to simultaneously support multiple object perception and understanding tasks, including detection, segmentation, pose estimation, object captioning, object-based QA, etc. Experimental results demonstrate the superior performance of DINO-X. Specifically, the DINO-X Pro model achieves 56.0 AP, 59.8 AP, and 52.4 AP on the COCO, LVIS-minival, and LVIS-val zero-shot object detection benchmarks, respectively. Notably, it scores 63.3 AP and 56.5 AP on the rare classes of LVIS-minival and LVIS-val benchmarks, improving the previous SOTA performance by 5.8 AP and 5.0 AP. Such a result underscores its significantly improved capacity for recognizing long-tailed objects.
CLFeb 27, 2024
Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked AutoencoderJiaqi Wang, Zhenxi Song, Zhengyu Ma et al.
Reconstructing natural language from non-invasive electroencephalography (EEG) holds great promise as a language decoding technology for brain-computer interfaces (BCIs). However, EEG-based language decoding is still in its nascent stages, facing several technical issues such as: 1) Absence of a hybrid strategy that can effectively integrate cross-modality (between EEG and text) self-learning with intra-modality self-reconstruction of EEG features or textual sequences; 2) Under-utilization of large language models (LLMs) to enhance EEG-based language decoding. To address above issues, we propose the Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text through a dedicated multi-stream encoder. Furthermore, we develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations), which leverages pre-trained modules alongside the EEG stream from CET-MAE and further enables an LLM (specifically BART) to decode text from EEG sequences. Comprehensive experiments conducted on the popular text-evoked EEG database, ZuCo, demonstrate the superiority of E2T-PTR, which outperforms the state-of-the-art in ROUGE-1 F1 and BLEU-4 scores by 8.34% and 32.21%, respectively. These results indicate significant advancements in the field and underscores the proposed framework's potential to enable more powerful and widespread BCI applications.
NEApr 14
BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment DistillationSihang Guo, Chenlin Zhou, Jiaqi Wang et al.
Spiking Neural Networks (SNNs) offer promising energy-efficient alternatives to large language models (LLMs) due to their event-driven nature and ultra-low power consumption. However, to preserve capacity, most existing spiking LLMs still incur intensive floating-point matrix multiplication (MatMul) and nonlinearities, or training difficulties arising from the complex spatiotemporal dynamics. To address these challenges, we propose BiSpikCLM, the first fully binary spiking MatMul-free causal language model. BiSpikCLM introduces Softmax-Free Spiking Attention (SFSA), eliminating softmax and floating-point operations in autoregressive language modeling. For efficient training, we introduce Spike-Aware Alignment Distillation (SpAD), which aligns ANN teacher and SNN student across embeddings, attention maps, intermediate features, and output logits. SpAD framework allows BiSpikCLM to reach comparable performance to ANN counterparts using substantially fewer training tokens (e.g., only 5.6% of the tokens for the 1.3B model). As a result, BiSpikCLM achieves competitive performance at only 4.16% - 5.87% of the computational cost on natural language generation tasks. Our results highlight the feasibility and effectiveness of fully binary spike-driven LLMs and establish the distillation as a promising pathway for brain-inspired spiking NLP.
ARNov 26, 2024
A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN TrainingMingjing Li, Huihui Zhou, Xiaofeng Xu et al.
There is a growing necessity for edge training to adapt to dynamically changing environment. Neuromorphic computing represents a significant pathway for high-efficiency intelligent computation in energy-constrained edges, but existing neuromorphic architectures lack the ability of directly training spiking neural networks (SNNs) based on backpropagation. We develop a multi-core neuromorphic architecture with Feedforward-Propagation, Back-Propagation, and Weight-Gradient engines in each core, supporting high efficient parallel computing at both the engine and core levels. It combines various data flows and sparse computation optimization by fully leveraging the sparsity in SNN training, obtaining a high energy efficiency of 1.05TFLOPS/W@ FP16 @ 28nm, 55 ~ 85% reduction of DRAM access compared to A100 GPU in SNN trainings, and a 20-core deep SNN training and a 5-worker federated learning on FPGAs. Our study develops the first multi-core neuromorphic architecture supporting the direct SNN training, facilitating the neuromorphic computing in edge-learnable applications.
LGDec 17, 2024
Efficient Speech Command Recognition Leveraging Spiking Neural Network and Curriculum Learning-based Knowledge DistillationJiaqi Wang, Liutao Yu, Liwei Huang et al.
The intrinsic dynamics and event-driven nature of spiking neural networks (SNNs) make them excel in processing temporal information by naturally utilizing embedded time sequences as time steps. Recent studies adopting this approach have demonstrated SNNs' effectiveness in speech command recognition, achieving high performance by employing large time steps for long time sequences. However, the large time steps lead to increased deployment burdens for edge computing applications. Thus, it is important to balance high performance and low energy consumption when detecting temporal patterns in edge devices. Our solution comprises two key components. 1). We propose a high-performance fully spike-driven framework termed SpikeSCR, characterized by a global-local hybrid structure for efficient representation learning, which exhibits long-term learning capabilities with extended time steps. 2). To further fully embrace low energy consumption, we propose an effective knowledge distillation method based on curriculum learning (KDCL), where valuable representations learned from the easy curriculum are progressively transferred to the hard curriculum with minor loss, striking a trade-off between power efficiency and high performance. We evaluate our method on three benchmark datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC), and the Google Speech Commands (GSC) V2. Our experimental results demonstrate that SpikeSCR outperforms current state-of-the-art (SOTA) methods across these three datasets with the same time steps. Furthermore, by executing KDCL, we reduce the number of time steps by 60% and decrease energy consumption by 54.8% while maintaining comparable performance to recent SOTA results. Therefore, this work offers valuable insights for tackling temporal processing challenges with long time sequences in edge neuromorphic computing systems.
NEAug 18, 2025
A Self-Ensemble Inspired Approach for Effective Training of Binary-Weight Spiking Neural NetworksQingyan Meng, Mingqing Xiao, Zhengyu Ma et al.
Spiking Neural Networks (SNNs) are a promising approach to low-power applications on neuromorphic hardware due to their energy efficiency. However, training SNNs is challenging because of the non-differentiable spike generation function. To address this issue, the commonly used approach is to adopt the backpropagation through time framework, while assigning the gradient of the non-differentiable function with some surrogates. Similarly, Binary Neural Networks (BNNs) also face the non-differentiability problem and rely on approximating gradients. However, the deep relationship between these two fields and how their training techniques can benefit each other has not been systematically researched. Furthermore, training binary-weight SNNs is even more difficult. In this work, we present a novel perspective on the dynamics of SNNs and their close connection to BNNs through an analysis of the backpropagation process. We demonstrate that training a feedforward SNN can be viewed as training a self-ensemble of a binary-activation neural network with noise injection. Drawing from this new understanding of SNN dynamics, we introduce the Self-Ensemble Inspired training method for (Binary-Weight) SNNs (SEI-BWSNN), which achieves high-performance results with low latency even for the case of the 1-bit weights. Specifically, we leverage a structure of multiple shortcuts and a knowledge distillation-based training technique to improve the training of (binary-weight) SNNs. Notably, by binarizing FFN layers in a Transformer architecture, our approach achieves 82.52% accuracy on ImageNet with only 2 time steps, indicating the effectiveness of our methodology and the potential of binary-weight SNNs.
CVJun 21, 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action RecognitionLiutao Yu, Liwei Huang, Chenlin Zhou et al.
Video action recognition (VAR) plays crucial roles in various domains such as surveillance, healthcare, and industrial automation, making it highly significant for the society. Consequently, it has long been a research spot in the computer vision field. As artificial neural networks (ANNs) are flourishing, convolution neural networks (CNNs), including 2D-CNNs and 3D-CNNs, as well as variants of the vision transformer (ViT), have shown impressive performance on VAR. However, they usually demand huge computational cost due to the large data volume and heavy information redundancy introduced by the temporal dimension. To address this challenge, some researchers have turned to brain-inspired spiking neural networks (SNNs), such as recurrent SNNs and ANN-converted SNNs, leveraging their inherent temporal dynamics and energy efficiency. Yet, current SNNs for VAR also encounter limitations, such as nontrivial input preprocessing, intricate network construction/training, and the need for repetitive processing of the same video clip, hindering their practical deployment. In this study, we innovatively propose the directly trained SVFormer (Spiking Video transFormer) for VAR. SVFormer integrates local feature extraction, global self-attention, and the intrinsic dynamics, sparsity, and spike-driven nature of SNNs, to efficiently and effectively extract spatio-temporal features. We evaluate SVFormer on two RGB datasets (UCF101, NTU-RGBD60) and one neuromorphic dataset (DVS128-Gesture), demonstrating comparable performance to the mainstream models in a more efficient way. Notably, SVFormer achieves a top-1 accuracy of 84.03% with ultra-low power consumption (21 mJ/video) on UCF101, which is state-of-the-art among directly trained deep SNNs, showcasing significant advantages over prior models.
CVMay 23, 2023
Temporal Contrastive Learning for Spiking Neural NetworksHaonan Qiu, Zeyin Song, Yanqi Chen et al.
Biologically inspired spiking neural networks (SNNs) have garnered considerable attention due to their low-energy consumption and spatio-temporal information processing capabilities. Most existing SNNs training methods first integrate output information across time steps, then adopt the cross-entropy (CE) loss to supervise the prediction of the average representations. However, in this work, we find the method above is not ideal for the SNNs training as it omits the temporal dynamics of SNNs and degrades the performance quickly with the decrease of inference time steps. One tempting method to model temporal correlations is to apply the same label supervision at each time step and treat them identically. Although it can acquire relatively consistent performance across various time steps, it still faces challenges in obtaining SNNs with high performance. Inspired by these observations, we propose Temporal-domain supervised Contrastive Learning (TCL) framework, a novel method to obtain SNNs with low latency and high performance by incorporating contrastive supervision with temporal domain information. Contrastive learning (CL) prompts the network to discern both consistency and variability in the representation space, enabling it to better learn discriminative and generalizable features. We extend this concept to the temporal domain of SNNs, allowing us to flexibly and fully leverage the correlation between representations at different time steps. Furthermore, we propose a Siamese Temporal-domain supervised Contrastive Learning (STCL) framework to enhance the SNNs via augmentation, temporal and class constraints simultaneously. Extensive experimental results demonstrate that SNNs trained by our TCL and STCL can achieve both high performance and low latency, achieving state-of-the-art performance on a variety of datasets (e.g., CIFAR-10, CIFAR-100, and DVS-CIFAR10).
NEMay 10, 2023
Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient BackpropagationChenlin Zhou, Han Zhang, Zhaokun Zhou et al.
Deep spiking neural networks (SNNs) have drawn much attention in recent years because of their low power consumption, biological rationality and event-driven property. However, state-of-the-art deep SNNs (including Spikformer and Spikingformer) suffer from a critical challenge related to the imprecise gradient backpropagation. This problem arises from the improper design of downsampling modules in these networks, and greatly hampering the overall model performance. In this paper, we propose ConvBN-MaxPooling-LIF (CML), an SNN-optimized downsampling with precise gradient backpropagation. We prove that CML can effectively overcome the imprecision of gradient backpropagation from a theoretical perspective. In addition, we evaluate CML on ImageNet, CIFAR10, CIFAR100, CIFAR10-DVS, DVS128-Gesture datasets, and show state-of-the-art performance on all these datasets with significantly enhanced performances compared with Spikingformer. For instance, our model achieves 77.64 $\%$ on ImageNet, 96.04 $\%$ on CIFAR10, 81.4$\%$ on CIFAR10-DVS, with + 1.79$\%$ on ImageNet, +1.16$\%$ on CIFAR100 compared with Spikingformer.
IRJun 3, 2019
Mining Data from the Congressional RecordZhengyu Ma, Tianjiao Qi, James Route et al.
We propose a data storage and analysis method for using the US Congressional record as a policy analysis tool. We use Amazon Web Services and the Solr search engine to store and process Congressional record data from 1789 to the present, and then query Solr to find how frequently language related to tax increases and decreases appears. This frequency data is compared to six economic indicators. Our preliminary results indicate potential relationships between incidence of tax discussion and multiple indicators. We present our data storage and analysis procedures, as well as results from comparisons to all six indicators.