Pengfei Sun

NE
h-index13
18papers
291citations
Novelty51%
AI Score52

18 Papers

NEApr 20, 2022
Axonal Delay As a Short-Term Memory for Feed Forward Deep Spiking Neural Networks

Pengfei Sun, Longwei Zhu, Dick Botteldooren

The information of spiking neural networks (SNNs) are propagated between the adjacent biological neuron by spikes, which provides a computing paradigm with the promise of simulating the human brain. Recent studies have found that the time delay of neurons plays an important role in the learning process. Therefore, configuring the precise timing of the spike is a promising direction for understanding and improving the transmission process of temporal information in SNNs. However, most of the existing learning methods for spiking neurons are focusing on the adjustment of synaptic weight, while very few research has been working on axonal delay. In this paper, we verify the effectiveness of integrating time delay into supervised learning and propose a module that modulates the axonal delay through short-term memory. To this end, a rectified axonal delay (RAD) module is integrated with the spiking model to align the spike timing and thus improve the characterization learning ability of temporal features. Experiments on three neuromorphic benchmark datasets : NMNIST, DVS Gesture and N-TIDIGITS18 show that the proposed method achieves the state-of-the-art performance while using the fewest parameters.

NEOct 23, 2023
Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Pengfei Sun, Jibin Wu, Malu Zhang et al.

Recurrent Neural Networks (RNNs) are widely recognized for their proficiency in modeling temporal dependencies, making them highly prevalent in sequential data processing applications. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs tend to be over-parameterized, resulting in poor computational efficiency and network generalization. To address these challenges, this paper proposes a novel Delayed Memory Unit (DMU). The DMU incorporates a delay line structure along with delay gates into vanilla RNN, thereby enhancing temporal interaction and facilitating temporal credit assignment. Specifically, the DMU is designed to directly distribute the input information to the optimal time instant in the future, rather than aggregating and redistributing it over time through intricate network dynamics. Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential image classification.

90.5NEMay 2
Algorithm-hardware co-design of neuromorphic networks with dual memory pathways

Pengfei Sun, Zhe Su, Jascha Achterberg et al. · cambridge

Spiking neural networks excel at event-driven sensing. Yet, maintaining task-relevant context over long timescales both algorithmically and in hardware, while respecting both tight energy and memory budgets, remains a core challenge in the field. We address this challenge through an algorithm-hardware co-design effort. At the algorithm level, inspired by the cortical fast-slow organization in the brain, we introduce a neural network with an explicit slow memory pathway that, combined with fast spiking activity, enables a dual memory pathway (DMP) architecture in which each layer maintains a compact low-dimensional state that summarizes recent activity and modulates spiking dynamics. This explicit memory stabilizes learning while preserving event-driven sparsity, achieving competitive accuracy on long-sequence benchmarks with 40-60% fewer parameters than equivalent state-of-the-art spiking neural networks. At the hardware level, we introduce a near-memory-compute architecture that fully leverages the advantages of the DMP architecture by retaining its compact shared state while optimizing dataflow, across heterogeneous sparse-spike and dense-memory pathways. We show experimental results that demonstrate more than a 4X increase in throughput and over a 5X improvement in energy efficiency compared with state-of-the-art implementations. Together, these contributions demonstrate that biological principles can guide functional abstractions that are both algorithmically effective and hardware-efficient, establishing a scalable co-design framework for real-time neuromorphic computation and learning.

89.9MAApr 21Code
CogGen: A Cognitively Inspired Recursive Framework for Deep Research Report Generation

Kuo Tian, Pengfei Sun, Zhen Wu et al.

The autonomous synthesis of deep research reports represents a critical frontier for Large Language Models (LLMs), demanding sophisticated information orchestration and non-linear narrative logic. Current approaches rely on rigid predefined linear workflows, which cause error accumulation, preclude global restructuring from subsequent insights, and ultimately limit in-depth multimodal fusion and report quality. We propose CogGen, a Cognitively inspired recursive framework for deep research report Generation. Leveraging a Hierarchical Recursive Architecture to simulate cognitive writing, CogGen enables flexible planning and global restructuring. To extend this recursivity to multimodal content, we introduce Abstract Visual Representation (AVR): a concise intent-driven language that iteratively refines visual-text layouts without pixel-level regeneration overhead. We further present CLEF, a Cognitive Load Evaluation Framework, and curate a new benchmark from Our World in Data (OWID). Extensive experiments show CogGen achieves state-of-the-art results among open-source systems, generating reports comparable to professional analysts' outputs and surpassing Gemini Deep Research. Our code and dataset are available at https://github.com/NJUNLP/CogGen.

LGDec 18, 2025
Batch Normalization-Free Fully Integer Quantized Neural Networks via Progressive Tandem Learning

Pengfei Sun, Wenyu Jiang, Piew Yoong Chee et al.

Quantised neural networks (QNNs) shrink models and reduce inference energy through low-bit arithmetic, yet most still depend on a running statistics batch normalisation (BN) layer, preventing true integer-only deployment. Prior attempts remove BN by parameter folding or tailored initialisation; while helpful, they rarely recover BN's stability and accuracy and often impose bespoke constraints. We present a BN-free, fully integer QNN trained via a progressive, layer-wise distillation scheme that slots into existing low-bit pipelines. Starting from a pretrained BN-enabled teacher, we use layer-wise targets and progressive compensation to train a student that performs inference exclusively with integer arithmetic and contains no BN operations. On ImageNet with AlexNet, the BN-free model attains competitive Top-1 accuracy under aggressive quantisation. The procedure integrates directly with standard quantisation workflows, enabling end-to-end integer-only inference for resource-constrained settings such as edge and embedded devices.

SPMar 21, 2024
EEG decoding with conditional identification information

Pengfei Sun, Jorg De Winne, Paul Devos et al.

Decoding EEG signals is crucial for unraveling human brain and advancing brain-computer interfaces. Traditional machine learning algorithms have been hindered by the high noise levels and inherent inter-person variations in EEG signals. Recent advances in deep neural networks (DNNs) have shown promise, owing to their advanced nonlinear modeling capabilities. However, DNN still faces challenge in decoding EEG samples of unseen individuals. To address this, this paper introduces a novel approach by incorporating the conditional identification information of each individual into the neural network, thereby enhancing model representation through the synergistic interaction of EEG and personal traits. We test our model on the WithMe dataset and demonstrated that the inclusion of these identifiers substantially boosts accuracy for both subjects in the training set and unseen subjects. This enhancement suggests promising potential for improving for EEG interpretability and understanding of relevant identification features.

NEJul 21, 2025
Beyond Rate Coding: Surrogate Gradients Enable Spike Timing Learning in Spiking Neural Networks

Ziqiao Yu, Pengfei Sun, Dan F. M. Goodman

We investigate the extent to which Spiking Neural Networks (SNNs) trained with Surrogate Gradient Descent (Surrogate GD), with and without delay learning, can learn from precise spike timing beyond firing rates. We first design synthetic tasks isolating intra-neuron inter-spike intervals and cross-neuron synchrony under matched spike counts. On more complex spike-based speech recognition datasets (Spiking Heidelberg Digits (SHD) and Spiking Speech Commands (SSC), we construct variants where spike count information is eliminated and only timing information remains, and show that Surrogate GD-trained SNNs are able to perform significantly above chance whereas purely rate-based models perform at chance level. We further evaluate robustness under biologically inspired perturbations -- including Gaussian jitter per spike or per-neuron, and spike deletion -- revealing consistent but perturbation-specific degradation. Networks show a sharp performance drop when spike sequences are reversed in time, with a larger drop in performance from SNNs trained with delays, indicating that these networks are more human-like in terms of behaviour. To facilitate further studies of temporal coding, we have released our modified SHD and SSC datasets.

CRSep 1, 2021
Let Your Camera See for You: A Novel Two-Factor Authentication Method against Real-Time Phishing Attacks

Yuanyi Sun, Sencun Zhu, Yao Zhao et al.

Today, two-factor authentication (2FA) is a widely implemented mechanism to counter phishing attacks. Although much effort has been investigated in 2FA, most 2FA systems are still vulnerable to carefully designed phishing attacks, and some even request special hardware, which limits their wide deployment. Recently, real-time phishing (RTP) has made the situation even worse because an adversary can effortlessly establish a phishing website replicating a target website without any background of the web page design technique. Traditional 2FA can be easily bypassed by such RTP attacks. In this work, we propose a novel 2FA system to counter RTP attacks. The main idea is to request a user to take a photo of the web browser with the domain name in the address bar as the 2nd authentication factor. The web server side extracts the domain name information based on Optical Character Recognition (OCR), and then determines if the user is visiting this website or a fake one, thus defeating the RTP attacks where an adversary must set up a fake website with a different domain. We prototyped our system and evaluated its performance in various environments. The results showed that PhotoAuth is an effective technique with good scalability. We also showed that compared to other 2FA systems, PhotoAuth has several advantages, especially no special hardware or software support is needed on the client side except a phone, making it readily deployable.

IVMay 7, 2021
NTIRE 2021 Challenge on Perceptual Image Quality Assessment

Jinjin Gu, Haoming Cai, Chao Dong et al.

This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance.

CVNov 18, 2019
GLMNet: Graph Learning-Matching Networks for Feature Matching

Bo Jiang, Pengfei Sun, Jin Tang et al.

Recently, graph convolutional networks (GCNs) have shown great potential for the task of graph matching. It can integrate graph node feature embedding, node-wise affinity learning and matching optimization together in a unified end-to-end model. One important aspect of graph matching is the construction of two matching graphs. However, the matching graphs we feed to existing graph convolutional matching networks are generally fixed and independent of graph matching, which thus are not guaranteed to be optimal for the graph matching task. Also, existing GCN matching method employs several general smoothing-based graph convolutional layers to generate graph node embeddings, in which extensive smoothing convolution operation may dilute the desired discriminatory information of graph nodes. To overcome these issues, we propose a novel Graph Learning-Matching Network (GLMNet) for graph matching problem. GLMNet has three main aspects. (1) It integrates graph learning into graph matching which thus adaptively learn a pair of optimal graphs that best serve graph matching task. (2) It further employs a Laplacian sharpening convolutional module to generate more discriminative node embeddings for graph matching. (3) A new constraint regularized loss is designed for GLMNet training which can encode the desired one-to-one matching constraints in matching optimization. Experiments on two benchmarks demonstrate the effectiveness of GLMNet and advantages of its main modules.

CRSep 10, 2019
Selfie: User-defined Sensitive Memory Protection and Recovery

Pengfei Sun, Saman Zonouz

Different users always have different requirement for sensitive memory definition. It is not flexible for aborting program execution once detecting memory corruption. Because the users may loose some sensitive data. We presented Selfie, a hybrid solution to provide one flexible solution to protect the sensitive memory according to users' requirements in runtime. Finally, Selfie can provide one solution to decide whether execution needs to be recovered. If the memory corruption doesn't belong sensitive memory, Selfie provides symbolic solver that can help figure out whether the memory corruption can affect the sensitive memory in future.

LGSep 3, 2019
Brain2Char: A Deep Architecture for Decoding Text from Brain Recordings

Pengfei Sun, Gopala K. Anumanchipalli, Edward F. Chang

Decoding language representations directly from the brain can enable new Brain-Computer Interfaces (BCI) for high bandwidth human-human and human-machine communication. Clinically, such technologies can restore communication in people with neurological conditions affecting their ability to speak. In this study, we propose a novel deep network architecture Brain2Char, for directly decoding text (specifically character sequences) from direct brain recordings (called Electrocorticography, ECoG). Brain2Char framework combines state-of-the-art deep learning modules --- 3D Inception layers for multiband spatiotemporal feature extraction from neural data and bidirectional recurrent layers, dilated convolution layers followed by language model weighted beam search to decode character sequences, optimizing a connectionist temporal classification (CTC) loss. Additionally, given the highly non-linear transformations that underlie the conversion of cortical function to character sequences, we perform regularizations on the network's latent representations motivated by insights into cortical encoding of speech production and artifactual aspects specific to ECoG data acquisition. To do this, we impose auxiliary losses on latent representations for articulatory movements, speech acoustics and session specific non-linearities. In 3 participants tested here, Brain2Char achieves 10.6\%, 8.5\% and 7.0\% Word Error Rates (WER) respectively on vocabulary sizes ranging from 1200 to 1900 words. Brain2Char also performs well when 2 participants silently mimed sentences. These results set a new state-of-the-art on decoding text from brain and demonstrate the potential of Brain2Char as a high-performance communication BCI.

NEJan 13, 2019
Modeling neural dynamics during speech production using a state space variational autoencoder

Pengfei Sun, David A. Moses, Edward Chang

Characterizing the neural encoding of behavior remains a challenging task in many research areas due in part to complex and noisy spatiotemporal dynamics of evoked brain activity. An important aspect of modeling these neural encodings involves separation of robust, behaviorally relevant signals from background activity, which often contains signals from irrelevant brain processes and decaying information from previous behavioral events. To achieve this separation, we develop a two-branch State Space Variational AutoEncoder (SSVAE) model to individually describe the instantaneous evoked foreground signals and the context-dependent background signals. We modeled the spontaneous speech-evoked brain dynamics using smoothed Gaussian mixture models. By applying the proposed SSVAE model to track ECoG dynamics in one participant over multiple hours, we find that the model can predict speech-related dynamics more accurately than other latent factor inference algorithms. Our results demonstrate that separately modeling the instantaneous speech-evoked and slow context-dependent brain dynamics can enhance tracking performance, which has important implications for the development of advanced neural encoding and decoding models in various neuroscience sub-disciplines.

SDDec 16, 2016
Neural networks based EEG-Speech Models

Pengfei Sun, Jun Qin

In this paper, we propose an end-to-end neural network (NN) based EEG-speech (NES) modeling framework, in which three network structures are developed to map imagined EEG signals to phonemes. The proposed NES models incorporate a language model based EEG feature extraction layer, an acoustic feature mapping layer, and a restricted Boltzmann machine (RBM) based the feature learning layer. The NES models can jointly realize the representation of multichannel EEG signals and the projection of acoustic speech signals. Among three proposed NES models, two augmented networks utilize spoken EEG signals as either bias or gate information to strengthen the feature learning and translation of imagined EEG signals. Experimental results show that all three proposed NES models outperform the baseline support vector machine (SVM) method on EEG-speech classification. With respect to binary classification, our approach achieves comparable results relative to deep believe network approach.

SDNov 1, 2016
Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech Detection

Pengfei Sun, Jun Qin

In this letter, we propose enhanced factored three way restricted Boltzmann machines (EFTW-RBMs) for speech detection. The proposed model incorporates conditional feature learning by multiplying the dynamical state of the third unit, which allows a modulation over the visible-hidden node pairs. Instead of stacking previous frames of speech as the third unit in a recursive manner, the correlation related weighting coefficients are assigned to the contextual neighboring frames. Specifically, a threshold function is designed to capture the long-term features and blend the globally stored speech structure. A factored low rank approximation is introduced to reduce the parameters of the three-dimensional interaction tensor, on which non-negative constraint is imposed to address the sparsity characteristic. The validations through the area-under-ROC-curve (AUC) and signal distortion ratio (SDR) show that our approach outperforms several existing 1D and 2D (i.e., time and time-frequency domain) speech detection algorithms in various noisy environments.

SDOct 3, 2016
Speech Enhancement via Two-Stage Dual Tree Complex Wavelet Packet Transform with a Speech Presence Probability Estimator

Pengfei Sun, Jun Qin

In this paper, a two-stage dual tree complex wavelet packet transform (DTCWPT) based speech enhancement algorithm has been proposed, in which a speech presence probability (SPP) estimator and a generalized minimum mean squared error (MMSE) estimator are developed. To overcome the drawback of signal distortions caused by down sampling of WPT, a two-stage analytic decomposition concatenating undecimated WPT (UWPT) and decimated WPT is employed. An SPP estimator in the DTCWPT domain is derived based on a generalized Gamma distribution of speech, and Gaussian noise assumption. The validation results show that the proposed algorithm can obtain enhanced perceptual evaluation of speech quality (PESQ), and segmental signal-to-noise ratio (SegSNR) at low SNR nonstationary noise, compared with other four state-of-the-art speech enhancement algorithms, including optimally modified LSA (OM-LSA), soft masking using a posteriori SNR uncertainty (SMPO), a posteriori SPP based MMSE estimation (MMSE-SPP), and adaptive Bayesian wavelet thresholding (BWT).

SDSep 29, 2016
Semi-supervised Speech Enhancement in Envelop and Details Subspaces

Pengfei Sun, Jun Qin

In this study, we propose a modulation decoupling based single channel speech enhancement subspace framework, in which the spectrogram of noisy speech is decoupled as the product of a spectral envelop subspace and a spectral details subspace. This decoupling approach provides a method to specifically work on elimination of those noises that greatly affect the intelligibility. Two supervised low-rank and sparse decomposition schemes are developed in the spectral envelop subspace to obtain a robust recovery of speech components. A Bayesian formulation of non-negative factorization is used to learn the speech dictionary from the spectral envelop subspace of clean speech samples. In the spectral details subspace, a standard robust principal component analysis is implemented to extract the speech components. The validation results show that compared with four speech enhancement algorithms, including MMSE-SPP, NMF-RPCA, RPCA, and LARC, the proposed MS based algorithms achieve satisfactory performance on improving perceptual quality, and especially speech intelligibility.

SDSep 29, 2016
Low Rank and Sparsity Analysis Applied to Speech Enhancement via Online Estimated Dictionary

Pengfei Sun, Jun Qin

We propose an online estimated dictionary based single channel speech enhancement algorithm, which focuses on low rank and sparse matrix decomposition. In this proposed algorithm, a noisy speech spectral matrix is considered as the summation of low rank background noise components and an activation of the online speech dictionary, on which both low rank and sparsity constraints are imposed. This decomposition takes the advantage of local estimated dictionary high expressiveness on speech components. The local dictionary can be obtained through estimating the speech presence probability by applying Expectation Maximal algorithm, in which a generalized Gamma prior for speech magnitude spectrum is used. The evaluation results show that the proposed algorithm achieves significant improvements when compared to four other speech enhancement algorithms.