CVJun 21, 2023
Spiking Neural Network for Ultra-low-latency and High-accurate Object DetectionJinye Qu, Zeyu Gao, Tielin Zhang et al.
Spiking Neural Networks (SNNs) have garnered widespread interest for their energy efficiency and brain-inspired event-driven properties. While recent methods like Spiking-YOLO have expanded the SNNs to more challenging object detection tasks, they often suffer from high latency and low detection accuracy, making them difficult to deploy on latency sensitive mobile platforms. Furthermore, the conversion method from Artificial Neural Networks (ANNs) to SNNs is hard to maintain the complete structure of the ANNs, resulting in poor feature representation and high conversion errors. To address these challenges, we propose two methods: timesteps compression and spike-time-dependent integrated (STDI) coding. The former reduces the timesteps required in ANN-SNN conversion by compressing information, while the latter sets a time-varying threshold to expand the information holding capacity. We also present a SNN-based ultra-low latency and high accurate object detection model (SUHD) that achieves state-of-the-art performance on nontrivial datasets like PASCAL VOC and MS COCO, with about remarkable 750x fewer timesteps and 30% mean average precision (mAP) improvement, compared to the Spiking-YOLO on MS COCO datasets. To the best of our knowledge, SUHD is the deepest spike-based object detection model to date that achieves ultra low timesteps to complete the lossless conversion.
LGSep 25, 2023
ODE-based Recurrent Model-free Reinforcement Learning for POMDPsXuanle Zhao, Duzhen Zhang, Liyuan Han et al.
Neural ordinary differential equations (ODEs) are widely recognized as the standard for modeling physical mechanisms, which help to perform approximate inference in unknown physical or biological environments. In partially observable (PO) environments, how to infer unseen information from raw observations puzzled the agents. By using a recurrent policy with a compact context, context-based reinforcement learning provides a flexible way to extract unobservable information from historical transitions. To help the agent extract more dynamics-related information, we present a novel ODE-based recurrent model combines with model-free reinforcement learning (RL) framework to solve partially observable Markov decision processes (POMDPs). We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks. Furthermore, our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
NEDec 29, 2022
Tuning Synaptic Connections instead of Weights by Genetic Algorithm in Spiking Policy NetworkDuzhen Zhang, Tielin Zhang, Shuncheng Jia et al.
Learning from interaction is the primary way that biological agents acquire knowledge about their environment and themselves. Modern deep reinforcement learning (DRL) explores a computational approach to learning from interaction and has made significant progress in solving various tasks. However, despite its power, DRL still falls short of biological agents in terms of energy efficiency. Although the underlying mechanisms are not fully understood, we believe that the integration of spiking communication between neurons and biologically-plausible synaptic plasticity plays a prominent role in achieving greater energy efficiency. Following this biological intuition, we optimized a spiking policy network (SPN) using a genetic algorithm as an energy-efficient alternative to DRL. Our SPN mimics the sensorimotor neuron pathway of insects and communicates through event-based spikes. Inspired by biological research showing that the brain forms memories by creating new synaptic connections and rewiring these connections based on new experiences, we tuned the synaptic connections instead of weights in the SPN to solve given tasks. Experimental results on several robotic control tasks demonstrate that our method can achieve the same level of performance as mainstream DRL methods while exhibiting significantly higher energy efficiency.
NESep 14, 2024
Multiscale fusion enhanced spiking neural network for invasive BCI neural signal decodingYu Song, Liyuan Han, Bo Xu et al.
Brain-computer interfaces (BCIs) are an advanced fusion of neuroscience and artificial intelligence, requiring stable and long-term decoding of neural signals. Spiking Neural Networks (SNNs), with their neuronal dynamics and spike-based signal processing, are inherently well-suited for this task. This paper presents a novel approach utilizing a Multiscale Fusion enhanced Spiking Neural Network (MFSNN). The MFSNN emulates the parallel processing and multiscale feature fusion seen in human visual perception to enable real-time, efficient, and energy-conserving neural signal decoding. Initially, the MFSNN employs temporal convolutional networks and channel attention mechanisms to extract spatiotemporal features from raw data. It then enhances decoding performance by integrating these features through skip connections. Additionally, the MFSNN improves generalizability and robustness in cross-day signal decoding through mini-batch supervised generalization learning. In two benchmark invasive BCI paradigms, including the single-hand grasp-and-touch and center-and-out reach tasks, the MFSNN surpasses traditional artificial neural network methods, such as MLP and GRU, in both accuracy and computational efficiency. Moreover, the MFSNN's multiscale feature fusion framework is well-suited for the implementation on neuromorphic chips, offering an energy-efficient solution for online decoding of invasive BCI signals.
CVAug 2, 2023
Attention-free Spikformer: Mixing Spike Sequences with Simple Linear TransformsQingyu Wang, Duzhen Zhang, Tielin Zhang et al.
By integrating the self-attention capability and the biological properties of Spiking Neural Networks (SNNs), Spikformer applies the flourishing Transformer architecture to SNNs design. It introduces a Spiking Self-Attention (SSA) module to mix sparse visual features using spike-form Query, Key, and Value, resulting in the State-Of-The-Art (SOTA) performance on numerous datasets compared to previous SNN-like frameworks. In this paper, we demonstrate that the Spikformer architecture can be accelerated by replacing the SSA with an unparameterized Linear Transform (LT) such as Fourier and Wavelet transforms. These transforms are utilized to mix spike sequences, reducing the quadratic time complexity to log-linear time complexity. They alternate between the frequency and time domains to extract sparse visual features, showcasing powerful performance and efficiency. We conduct extensive experiments on image classification using both neuromorphic and static datasets. The results indicate that compared to the SOTA Spikformer with SSA, Spikformer with LT achieves higher Top-1 accuracy on neuromorphic datasets (i.e., CIFAR10-DVS and DVS128 Gesture) and comparable Top-1 accuracy on static datasets (i.e., CIFAR-10 and CIFAR-100). Furthermore, Spikformer with LT achieves approximately 29-51% improvement in training speed, 61-70% improvement in inference speed, and reduces memory usage by 4-26% due to not requiring learnable parameters.
LGNov 21, 2023
Enhancing Solutions for Complex PDEs: Introducing Complementary Convolution and Equivariant Attention in Fourier Neural OperatorsXuanle Zhao, Yue Sun, Tielin Zhang et al.
Neural operators improve conventional neural networks by expanding their capabilities of functional mappings between different function spaces to solve partial differential equations (PDEs). One of the most notable methods is the Fourier Neural Operator (FNO), which draws inspiration from Green's function method and directly approximates operator kernels in the frequency domain. However, after empirical observation followed by theoretical validation, we demonstrate that the FNO approximates kernels primarily in a relatively low-frequency domain. This suggests a limited capability in solving complex PDEs, particularly those characterized by rapid coefficient changes and oscillations in the solution space. Such cases are crucial in specific scenarios, like atmospheric convection and ocean circulation. To address this challenge, inspired by the translation equivariant of the convolution kernel, we propose a novel hierarchical Fourier neural operator along with convolution-residual layers and attention mechanisms to make them complementary in the frequency domain to solve complex PDEs. We perform experiments on forward and reverse problems of multiscale elliptic equations, Navier-Stokes equations, and other physical scenarios, and find that the proposed method achieves superior performance in these PDE benchmarks, especially for equations characterized by rapid coefficient variations.
NENov 12, 2022
Motif-topology improved Spiking Neural Network for the Cocktail Party Effect and McGurk EffectShuncheng Jia, Tielin Zhang, Ruichen Zuo et al.
Network architectures and learning principles are playing key in forming complex functions in artificial neural networks (ANNs) and spiking neural networks (SNNs). SNNs are considered the new-generation artificial networks by incorporating more biological features than ANNs, including dynamic spiking neurons, functionally specified architectures, and efficient learning paradigms. Network architectures are also considered embodying the function of the network. Here, we propose a Motif-topology improved SNN (M-SNN) for the efficient multi-sensory integration and cognitive phenomenon simulations. The cognitive phenomenon simulation we simulated includes the cocktail party effect and McGurk effect, which are discussed by many researchers. Our M-SNN constituted by the meta operator called network motifs. The source of 3-node network motifs topology from artificial one pre-learned from the spatial or temporal dataset. In the single-sensory classification task, the results showed the accuracy of M-SNN using network motif topologies was higher than the pure feedforward network topology without using them. In the multi-sensory integration task, the performance of M-SNN using artificial network motif was better than the state-of-the-art SNN using BRP (biologically-plausible reward propagation). Furthermore, the M-SNN could better simulate the cocktail party effect and McGurk effect with lower computational cost. We think the artificial network motifs could be considered as some prior knowledge that would contribute to the multi-sensory integration of SNNs and provide more benefits for simulating the cognitive phenomenon.
32.5NEApr 30
UniBCI: Towards a Unified Pretrained Model for Invasive Brain-Computer InterfacesBinjie Hong, Rui Xiong, Liyuan Han et al.
Modeling invasive neural spike data is fundamental to advancing high-performance brain-computer interfaces (BCIs). However, existing approaches face critical challenges, including limited-scale heterogeneous data, cross-domain distribution shift, and the intrinsic spatiotemporal complexity of invasive neural signals. In this work, we propose UniBCI, a unified pretrained model for invasive Brain-Computer Interfaces. The model integrates three key components: (1) a context-conditioned spatio-temporal tokenization (CST) scheme that embeds neural signals together with metadata into a shared representation space; (2) a hierarchical Interval-Area Attention (IAA) mechanism that captures patterns of spike dynamics in slots via linear attention and locality dependencies via sliding-window attention; and (3) a scalable self-supervised masked signals reconstruction objective for learning generalizable neural representations from large-scale unlabeled data. We construct a pretraining corpus spanning multiple species, subjects, brain regions, and behavioral experiment paradigms. These heterogeneous recordings are standardize via our proposed unified normalization and tokenization. Comprehensive experiments demonstrate that UniBCI achieves SOTA performance across diverse downstream tasks while improving generalization. Moreover, the model achieves a strong balance between accuracy and efficiency, with fewer trainable parameters and lower inference latency. These results suggest that UniBCI provides a practical step toward general-purpose neural foundation models, enabling robust, scalable, and transferable representation learning for invasive neural data. The code for this paper is available at: https://anonymous.4open.science/r/UniBCI-C805.
CVMay 23, 2024
Enhanced Spatiotemporal Prediction Using Physical-guided And Frequency-enhanced Recurrent Neural NetworksXuanle Zhao, Yue Sun, Tielin Zhang et al.
Spatiotemporal prediction plays an important role in solving natural problems and processing video frames, especially in weather forecasting and human action recognition. Recent advances attempt to incorporate prior physical knowledge into the deep learning framework to estimate the unknown governing partial differential equations (PDEs), which have shown promising results in spatiotemporal prediction tasks. However, previous approaches only restrict neural network architectures or loss functions to acquire physical or PDE features, which decreases the representative capacity of a neural network. Meanwhile, the updating process of the physical state cannot be effectively estimated. To solve the above mentioned problems, this paper proposes a physical-guided neural network, which utilizes the frequency-enhanced Fourier module and moment loss to strengthen the model's ability to estimate the spatiotemporal dynamics. Furthermore, we propose an adaptive second-order Runge-Kutta method with physical constraints to model the physical states more precisely. We evaluate our model on both spatiotemporal and video prediction tasks. The experimental results show that our model outperforms state-of-the-art methods and performs best in several datasets, with a much smaller parameter count.
CLAug 17, 2025
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge GraphDuzhen Zhang, Zixiao Wang, Zhong-Zhi Li et al.
The rapid expansion of medical literature presents growing challenges for structuring and integrating domain knowledge at scale. Knowledge Graphs (KGs) offer a promising solution by enabling efficient retrieval, automated reasoning, and knowledge discovery. However, current KG construction methods often rely on supervised pipelines with limited generalizability or naively aggregate outputs from Large Language Models (LLMs), treating biomedical corpora as static and ignoring the temporal dynamics and contextual uncertainty of evolving knowledge. To address these limitations, we introduce MedKGent, a LLM agent framework for constructing temporally evolving medical KGs. Leveraging over 10 million PubMed abstracts published between 1975 and 2023, we simulate the emergence of biomedical knowledge via a fine-grained daily time series. MedKGent incrementally builds the KG in a day-by-day manner using two specialized agents powered by the Qwen2.5-32B-Instruct model. The Extractor Agent identifies knowledge triples and assigns confidence scores via sampling-based estimation, which are used to filter low-confidence extractions and inform downstream processing. The Constructor Agent incrementally integrates the retained triples into a temporally evolving graph, guided by confidence scores and timestamps to reinforce recurring knowledge and resolve conflicts. The resulting KG contains 156,275 entities and 2,971,384 relational triples. Quality assessments by two SOTA LLMs and three domain experts demonstrate an accuracy approaching 90%, with strong inter-rater agreement. To evaluate downstream utility, we conduct RAG across seven medical question answering benchmarks using five leading LLMs, consistently observing significant improvements over non-augmented baselines. Case studies further demonstrate the KG's value in literature-based drug repurposing via confidence-aware causal inference.
CVAug 6, 2025
Revisiting Continual Semantic Segmentation with Pre-trained Vision ModelsDuzhen Zhang, Yong Ren, Wei Cong et al.
Continual Semantic Segmentation (CSS) seeks to incrementally learn to segment novel classes while preserving knowledge of previously encountered ones. Recent advancements in CSS have been largely driven by the adoption of Pre-trained Vision Models (PVMs) as backbones. Among existing strategies, Direct Fine-Tuning (DFT), which sequentially fine-tunes the model across classes, remains the most straightforward approach. Prior work often regards DFT as a performance lower bound due to its presumed vulnerability to severe catastrophic forgetting, leading to the development of numerous complex mitigation techniques. However, we contend that this prevailing assumption is flawed. In this paper, we systematically revisit forgetting in DFT across two standard benchmarks, Pascal VOC 2012 and ADE20K, under eight CSS settings using two representative PVM backbones: ResNet101 and Swin-B. Through a detailed probing analysis, our findings reveal that existing methods significantly underestimate the inherent anti-forgetting capabilities of PVMs. Even under DFT, PVMs retain previously learned knowledge with minimal forgetting. Further investigation of the feature space indicates that the observed forgetting primarily arises from the classifier's drift away from the PVM, rather than from degradation of the backbone representations. Based on this insight, we propose DFT*, a simple yet effective enhancement to DFT that incorporates strategies such as freezing the PVM backbone and previously learned classifiers, as well as pre-allocating future classifiers. Extensive experiments show that DFT* consistently achieves competitive or superior performance compared to sixteen state-of-the-art CSS methods, while requiring substantially fewer trainable parameters and less training time.
CLMay 27, 2025
Information-Theoretic Complementary Prompts for Improved Continual Text ClassificationDuzhen Zhang, Yong Ren, Chenxing Li et al.
Continual Text Classification (CTC) aims to continuously classify new text data over time while minimizing catastrophic forgetting of previously acquired knowledge. However, existing methods often focus on task-specific knowledge, overlooking the importance of shared, task-agnostic knowledge. Inspired by the complementary learning systems theory, which posits that humans learn continually through the interaction of two systems -- the hippocampus, responsible for forming distinct representations of specific experiences, and the neocortex, which extracts more general and transferable representations from past experiences -- we introduce Information-Theoretic Complementary Prompts (InfoComp), a novel approach for CTC. InfoComp explicitly learns two distinct prompt spaces: P(rivate)-Prompt and S(hared)-Prompt. These respectively encode task-specific and task-invariant knowledge, enabling models to sequentially learn classification tasks without relying on data replay. To promote more informative prompt learning, InfoComp uses an information-theoretic framework that maximizes mutual information between different parameters (or encoded representations). Within this framework, we design two novel loss functions: (1) to strengthen the accumulation of task-specific knowledge in P-Prompt, effectively mitigating catastrophic forgetting, and (2) to enhance the retention of task-invariant knowledge in S-Prompt, improving forward knowledge transfer. Extensive experiments on diverse CTC benchmarks show that our approach outperforms previous state-of-the-art methods.
AIMay 10, 2023
Mixture of personality improved Spiking actor network for efficient multi-agent cooperationXiyun Li, Ziyi Ni, Jingqing Ruan et al.
Adaptive human-agent and agent-agent cooperation are becoming more and more critical in the research area of multi-agent reinforcement learning (MARL), where remarked progress has been made with the help of deep neural networks. However, many established algorithms can only perform well during the learning paradigm but exhibit poor generalization during cooperation with other unseen partners. The personality theory in cognitive psychology describes that humans can well handle the above cooperation challenge by predicting others' personalities first and then their complex actions. Inspired by this two-step psychology theory, we propose a biologically plausible mixture of personality (MoP) improved spiking actor network (SAN), whereby a determinantal point process is used to simulate the complex formation and integration of different types of personality in MoP, and dynamic and spiking neurons are incorporated into the SAN for the efficient reinforcement learning. The benchmark Overcooked task, containing a strong requirement for cooperative cooking, is selected to test the proposed MoP-SAN. The experimental results show that the MoP-SAN can achieve both high performances during not only the learning paradigm but also the generalization test (i.e., cooperation with other unseen agents) paradigm where most counterpart deep actor networks failed. Necessary ablation experiments and visualization analyses were conducted to explain why MoP and SAN are effective in multi-agent reinforcement learning scenarios while DNN performs poorly in the generalization test.
NEFeb 11, 2022
Motif-topology and Reward-learning improved Spiking Neural Network for Efficient Multi-sensory IntegrationShuncheng Jia, Ruichen Zuo, Tielin Zhang et al.
Network architectures and learning principles are key in forming complex functions in artificial neural networks (ANNs) and spiking neural networks (SNNs). SNNs are considered the new-generation artificial networks by incorporating more biological features than ANNs, including dynamic spiking neurons, functionally specified architectures, and efficient learning paradigms. In this paper, we propose a Motif-topology and Reward-learning improved SNN (MR-SNN) for efficient multi-sensory integration. MR-SNN contains 13 types of 3-node Motif topologies which are first extracted from independent single-sensory learning paradigms and then integrated for multi-sensory classification. The experimental results showed higher accuracy and stronger robustness of the proposed MR-SNN than other conventional SNNs without using Motifs. Furthermore, the proposed reward learning paradigm was biologically plausible and can better explain the cognitive McGurk effect caused by incongruent visual and auditory sensory signals.
AIJun 15, 2021
Population-coding and Dynamic-neurons improved Spiking Actor Network for Reinforcement LearningDuzhen Zhang, Tielin Zhang, Shuncheng Jia et al.
With the Deep Neural Networks (DNNs) as a powerful function approximator, Deep Reinforcement Learning (DRL) has been excellently demonstrated on robotic control tasks. Compared to DNNs with vanilla artificial neurons, the biologically plausible Spiking Neural Network (SNN) contains a diverse population of spiking neurons, making it naturally powerful on state representation with spatial and temporal information. Based on a hybrid learning framework, where a spike actor-network infers actions from states and a deep critic network evaluates the actor, we propose a Population-coding and Dynamic-neurons improved Spiking Actor Network (PDSAN) for efficient state representation from two different scales: input coding and neuronal coding. For input coding, we apply population coding with dynamically receptive fields to directly encode each input state component. For neuronal coding, we propose different types of dynamic-neurons (containing 1st-order and 2nd-order neuronal dynamics) to describe much more complex neuronal dynamics. Finally, the PDSAN is trained in conjunction with deep critic networks using the Twin Delayed Deep Deterministic policy gradient algorithm (TD3-PDSAN). Extensive experimental results show that our TD3-PDSAN model achieves better performance than state-of-the-art models on four OpenAI gym benchmark tasks. It is an important attempt to improve RL with SNN towards the effective computation satisfying biological plausibility.
NEOct 23, 2020
Quantum Superposition Inspired Spiking Neural NetworkYinqian Sun, Yi Zeng, Tielin Zhang
Despite advances in artificial intelligence models, neural networks still cannot achieve human performance, partly due to differences in how information is encoded and processed compared to human brain. Information in an artificial neural network (ANN) is represented using a statistical method and processed as a fitting function, enabling handling of structural patterns in image, text, and speech processing. However, substantial changes to the statistical characteristics of the data, for example, reversing the background of an image, dramatically reduce the performance. Here, we propose a quantum superposition spiking neural network (QS-SNN) inspired by quantum mechanisms and phenomena in the brain, which can handle reversal of image background color. The QS-SNN incorporates quantum theory with brain-inspired spiking neural network models from a computational perspective, resulting in more robust performance compared with traditional ANN models, especially when processing noisy inputs. The results presented here will inform future efforts to develop brain-inspired artificial intelligence.
NEOct 9, 2020
Tuning Convolutional Spiking Neural Network with Biologically-plausible Reward PropagationTielin Zhang, Shuncheng Jia, Xiang Cheng et al.
Spiking Neural Networks (SNNs) contain more biologically realistic structures and biologically-inspired learning principles than those in standard Artificial Neural Networks (ANNs). SNNs are considered the third generation of ANNs, powerful on the robust computation with a low computational cost. The neurons in SNNs are non-differential, containing decayed historical states and generating event-based spikes after their states reaching the firing threshold. These dynamic characteristics of SNNs make it difficult to be directly trained with the standard backpropagation (BP), which is also considered not biologically plausible. In this paper, a Biologically-plausible Reward Propagation (BRP) algorithm is proposed and applied to the SNN architecture with both spiking-convolution (with both 1D and 2D convolutional kernels) and full-connection layers. Unlike the standard BP that propagates error signals from post to presynaptic neurons layer by layer, the BRP propagates target labels instead of errors directly from the output layer to all pre-hidden layers. This effort is more consistent with the top-down reward-guiding learning in cortical columns of the neocortex. Synaptic modifications with only local gradient differences are induced with pseudo-BP that might also be replaced with the Spike-Timing Dependent Plasticity (STDP). The performance of the proposed BRP-SNN is further verified on the spatial (including MNIST and Cifar-10) and temporal (including TIDigits and DvsGesture) tasks, where the SNN using BRP has reached a similar accuracy compared to other state-of-the-art BP-based SNNs and saved 50% more computational cost than ANNs. We think the introduction of biologically plausible learning rules to the training procedure of biologically realistic SNNs will give us more hints and inspirations toward a better understanding of the biological system's intelligent nature.
NEOct 7, 2020
Finite Meta-Dynamic Neurons in Spiking Neural Networks for Spatio-temporal LearningXiang Cheng, Tielin Zhang, Shuncheng Jia et al.
Spiking Neural Networks (SNNs) have incorporated more biologically-plausible structures and learning principles, hence are playing critical roles in bridging the gap between artificial and natural neural networks. The spikes are the sparse signals describing the above-threshold event-based firing and under-threshold dynamic computation of membrane potentials, which give us an alternative uniformed and efficient way on both information representation and computation. Inspired from the biological network, where a finite number of meta neurons integrated together for various of cognitive functions, we proposed and constructed Meta-Dynamic Neurons (MDN) to improve SNNs for a better network generalization during spatio-temporal learning. The MDNs are designed with basic neuronal dynamics containing 1st-order and 2nd-order dynamics of membrane potentials, including the spatial and temporal meta types supported by some hyper-parameters. The MDNs generated from a spatial (MNIST) and a temporal (TIDigits) datasets first, and then extended to various other different spatio-temporal tasks (including Fashion-MNIST, NETtalk, Cifar-10, TIMIT and N-MNIST). The comparable accuracy was reached compared to other SOTA SNN algorithms, and a better generalization was also achieved by SNNs using MDNs than that without using MDNs.