Shibo Zhou

CV
h-index18
11papers
245citations
Novelty63%
AI Score43

11 Papers

CVDec 10, 2024Code
Efficient 3D Recognition with Event-driven Spike Sparse Convolution

Xuerui Qiu, Man Yao, Jieyuan Zhang et al.

Spiking Neural Networks (SNNs) provide an energy-efficient way to extract 3D spatio-temporal features. Point clouds are sparse 3D spatial data, which suggests that SNNs should be well-suited for processing them. However, when applying SNNs to point clouds, they often exhibit limited performance and fewer application scenarios. We attribute this to inappropriate preprocessing and feature extraction methods. To address this issue, we first introduce the Spike Voxel Coding (SVC) scheme, which encodes the 3D point clouds into a sparse spike train space, reducing the storage requirements and saving time on point cloud preprocessing. Then, we propose a Spike Sparse Convolution (SSC) model for efficiently extracting 3D sparse point cloud features. Combining SVC and SSC, we design an efficient 3D SNN backbone (E-3DSNN), which is friendly with neuromorphic hardware. For instance, SSC can be implemented on neuromorphic chips with only minor modifications to the addressing function of vanilla spike convolution. Experiments on ModelNet40, KITTI, and Semantic KITTI datasets demonstrate that E-3DSNN achieves state-of-the-art (SOTA) results with remarkable efficiency. Notably, our E-3DSNN (1.87M) obtained 91.7\% top-1 accuracy on ModelNet40, surpassing the current best SNN baselines (14.3M) by 3.0\%. To our best knowledge, it is the first direct training 3D SNN backbone that can simultaneously handle various 3D computer vision tasks (e.g., classification, detection, and segmentation) with an event-driven nature. Code is available: https://github.com/bollossom/E-3DSNN/.

CVMay 24, 2025Code
Spiking Neural Networks Need High Frequency Information

Yuetong Fang, Deming Zhou, Ziqing Wang et al.

Spiking Neural Networks promise brain-inspired and energy-efficient computation by transmitting information through binary (0/1) spikes. Yet, their performance still lags behind that of artificial neural networks, often assumed to result from information loss caused by sparse and binary activations. In this work, we challenge this long-standing assumption and reveal a previously overlooked frequency bias: spiking neurons inherently suppress high-frequency components and preferentially propagate low-frequency information. This frequency-domain imbalance, we argue, is the root cause of degraded feature representation in SNNs. Empirically, on Spiking Transformers, adopting Avg-Pooling (low-pass) for token mixing lowers performance to 76.73% on Cifar-100, whereas replacing it with Max-Pool (high-pass) pushes the top-1 accuracy to 79.12%. Accordingly, we introduce Max-Former that restores high-frequency signals through two frequency-enhancing operators: (1) extra Max-Pool in patch embedding, and (2) Depth-Wise Convolution in place of self-attention. Notably, Max-Former attains 82.39% top-1 accuracy on ImageNet using only 63.99M parameters, surpassing Spikformer (74.81%, 66.34M) by +7.58%. Extending our insight beyond transformers, our Max-ResNet-18 achieves state-of-the-art performance on convolution-based benchmarks: 97.17% on CIFAR-10 and 83.06% on CIFAR-100. We hope this simple yet effective solution inspires future research to explore the distinctive nature of spiking neural networks. Code is available: https://github.com/bic-L/MaxFormer.

CVOct 21, 2024
Enhancing SNN-based Spatio-Temporal Learning: A Benchmark Dataset and Cross-Modality Attention Model

Shibo Zhou, Bo Yang, Mengwen Yuan et al.

Spiking Neural Networks (SNNs), renowned for their low power consumption, brain-inspired architecture, and spatio-temporal representation capabilities, have garnered considerable attention in recent years. Similar to Artificial Neural Networks (ANNs), high-quality benchmark datasets are of great importance to the advances of SNNs. However, our analysis indicates that many prevalent neuromorphic datasets lack strong temporal correlation, preventing SNNs from fully exploiting their spatio-temporal representation capabilities. Meanwhile, the integration of event and frame modalities offers more comprehensive visual spatio-temporal information. Yet, the SNN-based cross-modality fusion remains underexplored. In this work, we present a neuromorphic dataset called DVS-SLR that can better exploit the inherent spatio-temporal properties of SNNs. Compared to existing datasets, it offers advantages in terms of higher temporal correlation, larger scale, and more varied scenarios. In addition, our neuromorphic dataset contains corresponding frame data, which can be used for developing SNN-based fusion methods. By virtue of the dual-modal feature of the dataset, we propose a Cross-Modality Attention (CMA) based fusion method. The CMA model efficiently utilizes the unique advantages of each modality, allowing for SNNs to learn both temporal and spatial attention scores from the spatio-temporal features of event and frame modalities, subsequently allocating these scores across modalities to enhance their synergy. Experimental results demonstrate that our method not only improves recognition accuracy but also ensures robustness across diverse scenarios.

AISep 22, 2025
Table2LaTeX-RL: High-Fidelity LaTeX Code Generation from Table Images via Reinforced Multimodal Language Models

Jun Ling, Yao Qi, Tao Huang et al.

In this work, we address the task of table image to LaTeX code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs. A central challenge of this task lies in accurately handling complex tables -- those with large sizes, deeply nested structures, and semantically rich or irregular cell content -- where existing methods often fail. We begin with a comprehensive analysis, identifying key challenges and highlighting the limitations of current evaluation protocols. To overcome these issues, we propose a reinforced multimodal large language model (MLLM) framework, where a pre-trained MLLM is fine-tuned on a large-scale table-to-LaTeX dataset. To further improve generation quality, we introduce a dual-reward reinforcement learning strategy based on Group Relative Policy Optimization (GRPO). Unlike standard approaches that optimize purely over text outputs, our method incorporates both a structure-level reward on LaTeX code and a visual fidelity reward computed from rendered outputs, enabling direct optimization of the visual output quality. We adopt a hybrid evaluation protocol combining TEDS-Structure and CW-SSIM, and show that our method achieves state-of-the-art performance, particularly on structurally complex tables, demonstrating the effectiveness and robustness of our approach.

LGJan 25, 2021
Spectrum Attention Mechanism for Time Series Classification

Shibo Zhou, Yu Pan

Time series classification(TSC) has always been an important and challenging research task. With the wide application of deep learning, more and more researchers use deep learning models to solve TSC problems. Since time series always contains a lot of noise, which has a negative impact on network training, people usually filter the original data before training the network. The existing schemes are to treat the filtering and training as two stages, and the design of the filter requires expert experience, which increases the design difficulty of the algorithm and is not universal. We note that the essence of filtering is to filter out the insignificant frequency components and highlight the important ones, which is similar to the attention mechanism. In this paper, we propose an attention mechanism that acts on spectrum (SAM). The network can assign appropriate weights to each frequency component to achieve adaptive filtering. We use L1 regularization to further enhance the frequency screening capability of SAM. We also propose a segmented-SAM (SSAM) to avoid the loss of time domain information caused by using the spectrum of the whole sequence. In which, a tumbling window is introduced to segment the original data. Then SAM is applied to each segment to generate new features. We propose a heuristic strategy to search for the appropriate number of segments. Experimental results show that SSAM can produce better feature representations, make the network converge faster, and improve the robustness and classification accuracy.

CVJan 21, 2021
A Spike Learning System for Event-driven Object Recognition

Shibo Zhou, Wei Wang, Xiaohua Li et al.

Event-driven sensors such as LiDAR and dynamic vision sensor (DVS) have found increased attention in high-resolution and high-speed applications. A lot of work has been conducted to enhance recognition accuracy. However, the essential topic of recognition delay or time efficiency is largely under-explored. In this paper, we present a spiking learning system that uses the spiking neural network (SNN) with a novel temporal coding for accurate and fast object recognition. The proposed temporal coding scheme maps each event's arrival time and data into SNN spike time so that asynchronously-arrived events are processed immediately without delay. The scheme is integrated nicely with the SNN's asynchronous processing capability to enhance time efficiency. A key advantage over existing systems is that the event accumulation time for each recognition task is determined automatically by the system rather than pre-set by the user. The system can finish recognition early without waiting for all the input events. Extensive experiments were conducted over a list of 7 LiDAR and DVS datasets. The results demonstrated that the proposed system had state-of-the-art recognition accuracy while achieving remarkable time efficiency. Recognition delay was shown to reduce by 56.3% to 91.7% in various experiment settings over the popular KITTI dataset.

LGOct 15, 2020
Spiking Neural Networks with Single-Spike Temporal-Coded Neurons for Network Intrusion Detection

Shibo Zhou, Xiaohua Li

Spiking neural network (SNN) is interesting due to its strong bio-plausibility and high energy efficiency. However, its performance is falling far behind conventional deep neural networks (DNNs). In this paper, considering a general class of single-spike temporal-coded integrate-and-fire neurons, we analyze the input-output expressions of both leaky and nonleaky neurons. We show that SNNs built with leaky neurons suffer from the overly-nonlinear and overly-complex input-output response, which is the major reason for their difficult training and low performance. This reason is more fundamental than the commonly believed problem of nondifferentiable spikes. To support this claim, we show that SNNs built with nonleaky neurons can have a less-complex and less-nonlinear input-output response. They can be easily trained and can have superior performance, which is demonstrated by experimenting with the SNNs over two popular network intrusion detection datasets, i.e., the NSL-KDD and the AWID datasets. Our experiment results show that the proposed SNNs outperform a comprehensive list of DNN models and classic machine learning models. This paper demonstrates that SNNs can be promising and competitive in contrast to common beliefs.

CVJan 24, 2020
Temporal Pulses Driven Spiking Neural Network for Fast Object Recognition in Autonomous Driving

Wei Wang, Shibo Zhou, Jingxi Li et al.

Accurate real-time object recognition from sensory data has long been a crucial and challenging task for autonomous driving. Even though deep neural networks (DNNs) have been successfully applied in this area, most existing methods still heavily rely on the pre-processing of the pulse signals derived from LiDAR sensors, and therefore introduce additional computational overhead and considerable latency. In this paper, we propose an approach to address the object recognition problem directly with raw temporal pulses utilizing the spiking neural network (SNN). Being evaluated on various datasets (including Sim LiDAR, KITTI and DVS-barrel) derived from LiDAR and dynamic vision sensor (DVS), our proposed method has shown comparable performance as the state-of-the-art methods, while achieving remarkable time efficiency. It highlights the SNN's great potentials in autonomous driving and related applications. To the best of our knowledge, this is the first attempt to use SNN to directly perform object recognition on raw temporal pulses.

CVDec 17, 2019
Deep SCNN-based Real-time Object Detection for Self-driving Vehicles Using LiDAR Temporal Data

Shibo Zhou, Ying Chen, Xiaohua Li et al.

Real-time accurate detection of three-dimensional (3D) objects is a fundamental necessity for self-driving vehicles. Most existing computer vision approaches are based on convolutional neural networks (CNNs). Although the CNN-based approaches can achieve high detection accuracy, their high energy consumption is a severe drawback. To resolve this problem, novel energy efficient approaches should be explored. Spiking neural network (SNN) is a promising candidate because it has orders-of-magnitude lower energy consumption than CNN. Unfortunately, the studying of SNN has been limited in small networks only. The application of SNN for large 3D object detection networks has remain largely open. In this paper, we integrate spiking convolutional neural network (SCNN) with temporal coding into the YOLOv2 architecture for real-time object detection. To take the advantage of spiking signals, we develop a novel data preprocessing layer that translates 3D point-cloud data into spike time data. We propose an analog circuit to implement the non-leaky integrate and fire neuron used in our SCNN, from which the energy consumption of each spike is estimated. Moreover, we present a method to calculate the network sparsity and the energy consumption of the overall network. Extensive experiments have been conducted based on the KITTI dataset, which show that the proposed network can reach competitive detection accuracy as existing approaches, yet with much lower average energy consumption. If implemented in dedicated hardware, our network could have a mean sparsity of 56.24% and extremely low total energy consumption of 0.247mJ only. Implemented in NVIDIA GTX 1080i GPU, we can achieve 35.7 fps frame rate, high enough for real-time object detection.

CVSep 24, 2019
Temporal-Coded Deep Spiking Neural Network with Easy Training and Robust Performance

Shibo Zhou, Xiaohua LI, Ying Chen et al.

Spiking neural network (SNN) is interesting both theoretically and practically because of its strong bio-inspiration nature and potentially outstanding energy efficiency. Unfortunately, its development has fallen far behind the conventional deep neural network (DNN), mainly because of difficult training and lack of widely accepted hardware experiment platforms. In this paper, we show that a deep temporal-coded SNN can be trained easily and directly over the benchmark datasets CIFAR10 and ImageNet, with testing accuracy within 1% of the DNN of equivalent size and architecture. Training becomes similar to DNN thanks to the closed-form solution to the spiking waveform dynamics. Considering that SNNs should be implemented in practical neuromorphic hardwares, we train the deep SNN with weights quantized to 8, 4, 2 bits and with weights perturbed by random noise to demonstrate its robustness in practical applications. In addition, we develop a phase-domain signal processing circuit schematic to implement our spiking neuron with 90% gain of energy efficiency over existing work. This paper demonstrates that the temporal-coded deep SNN is feasible for applications with high performance and high energy efficient.

NEOct 29, 2018
Object Detection based on LIDAR Temporal Pulses using Spiking Neural Networks

Shibo Zhou, Wei Wang

Neural networks has been successfully used in the processing of Lidar data, especially in the scenario of autonomous driving. However, existing methods heavily rely on pre-processing of the pulse signals derived from Lidar sensors and therefore result in high computational overhead and considerable latency. In this paper, we proposed an approach utilizing Spiking Neural Network (SNN) to address the object recognition problem directly with raw temporal pulses. To help with the evaluation and benchmarking, a comprehensive temporal pulses data-set was created to simulate Lidar reflection in different road scenarios. Being tested with regard to recognition accuracy and time efficiency under different noise conditions, our proposed method shows remarkable performance with the inference accuracy up to 99.83% (with 10% noise) and the average recognition delay as low as 265 ns. It highlights the potential of SNN in autonomous driving and some related applications. In particular, to our best knowledge, this is the first attempt to use SNN to directly perform object recognition on raw Lidar temporal pulses.