CVJul 21, 2022
Efficient CNN Architecture Design Guided by VisualizationLiangqi Zhang, Haibo Shen, Yihao Luo et al.
Modern efficient Convolutional Neural Networks(CNNs) always use Depthwise Separable Convolutions(DSCs) and Neural Architecture Search(NAS) to reduce the number of parameters and the computational complexity. But some inherent characteristics of networks are overlooked. Inspired by visualizing feature maps and N$\times$N(N$>$1) convolution kernels, several guidelines are introduced in this paper to further improve parameter efficiency and inference speed. Based on these guidelines, our parameter-efficient CNN architecture, called \textit{VGNetG}, achieves better accuracy and lower latency than previous networks with about 30%$\thicksim$50% parameters reduction. Our VGNetG-1.0MP achieves 67.7% top-1 accuracy with 0.99M parameters and 69.2% top-1 accuracy with 1.14M parameters on ImageNet classification dataset. Furthermore, we demonstrate that edge detectors can replace learnable depthwise convolution layers to mix features by replacing the N$\times$N kernels with fixed edge detection kernels. And our VGNetF-1.5MP archives 64.4%(-3.2%) top-1 accuracy and 66.2%(-1.4%) top-1 accuracy with additional Gaussian kernels.
CVMar 14, 2023
Training Robust Spiking Neural Networks with ViewPoint Transform and SpatioTemporal StretchingHaibo Shen, Juyu Xiao, Yihao Luo et al.
Neuromorphic vision sensors (event cameras) simulate biological visual perception systems and have the advantages of high temporal resolution, less data redundancy, low power consumption, and large dynamic range. Since both events and spikes are modeled from neural signals, event cameras are inherently suitable for spiking neural networks (SNNs), which are considered promising models for artificial intelligence (AI) and theoretical neuroscience. However, the unconventional visual signals of these cameras pose a great challenge to the robustness of spiking neural networks. In this paper, we propose a novel data augmentation method, ViewPoint Transform and SpatioTemporal Stretching (VPT-STS). It improves the robustness of SNNs by transforming the rotation centers and angles in the spatiotemporal domain to generate samples from different viewpoints. Furthermore, we introduce the spatiotemporal stretching to avoid potential information loss in viewpoint transformation. Extensive experiments on prevailing neuromorphic datasets demonstrate that VPT-STS is broadly effective on multi-event representations and significantly outperforms pure spatial geometric transformations. Notably, the SNNs model with VPT-STS achieves a state-of-the-art accuracy of 84.4\% on the DVS-CIFAR10 dataset.
CVJul 24, 2022
Training Robust Spiking Neural Networks on Neuromorphic Data with Spatiotemporal FragmentsHaibo Shen, Yihao Luo, Xiang Cao et al.
Neuromorphic vision sensors (event cameras) are inherently suitable for spiking neural networks (SNNs) and provide novel neuromorphic vision data for this biomimetic model. Due to the spatiotemporal characteristics, novel data augmentations are required to process the unconventional visual signals of these cameras. In this paper, we propose a novel Event SpatioTemporal Fragments (ESTF) augmentation method. It preserves the continuity of neuromorphic data by drifting or inverting fragments of the spatiotemporal event stream to simulate the disturbance of brightness variations, leading to more robust spiking neural networks. Extensive experiments are performed on prevailing neuromorphic datasets. It turns out that ESTF provides substantial improvements over pure geometric transformations and outperforms other event data augmentation methods. It is worth noting that the SNNs with ESTF achieve the state-of-the-art accuracy of 83.9\% on the CIFAR10-DVS dataset.
NEJul 24, 2022
Training Stronger Spiking Neural Networks with Biomimetic Adaptive Internal Association NeuronsHaibo Shen, Yihao Luo, Xiang Cao et al.
As the third generation of neural networks, spiking neural networks (SNNs) are dedicated to exploring more insightful neural mechanisms to achieve near-biological intelligence. Intuitively, biomimetic mechanisms are crucial to understanding and improving SNNs. For example, the associative long-term potentiation (ALTP) phenomenon suggests that in addition to learning mechanisms between neurons, there are associative effects within neurons. However, most existing methods only focus on the former and lack exploration of the internal association effects. In this paper, we propose a novel Adaptive Internal Association~(AIA) neuron model to establish previously ignored influences within neurons. Consistent with the ALTP phenomenon, the AIA neuron model is adaptive to input stimuli, and internal associative learning occurs only when both dendrites are stimulated at the same time. In addition, we employ weighted weights to measure internal associations and introduce intermediate caches to reduce the volatility of associations. Extensive experiments on prevailing neuromorphic datasets show that the proposed method can potentiate or depress the firing of spikes more specifically, resulting in better performance with fewer spikes. It is worth noting that without adding any parameters at inference, the AIA model achieves state-of-the-art performance on DVS-CIFAR10~(83.9\%) and N-CARS~(95.64\%) datasets.
CVFeb 24, 2023
Frequency and Scale Perspectives of Feature ExtractionLiangqi Zhang, Yihao Luo, Xiang Cao et al.
Convolutional neural networks (CNNs) have achieved superior performance but still lack clarity about the nature and properties of feature extraction. In this paper, by analyzing the sensitivity of neural networks to frequencies and scales, we find that neural networks not only have low- and medium-frequency biases but also prefer different frequency bands for different classes, and the scale of objects influences the preferred frequency bands. These observations lead to the hypothesis that neural networks must learn the ability to extract features at various scales and frequencies. To corroborate this hypothesis, we propose a network architecture based on Gaussian derivatives, which extracts features by constructing scale space and employing partial derivatives as local feature extraction operators to separate high-frequency information. This manually designed method of extracting features from different scales allows our GSSDNets to achieve comparable accuracy with vanilla networks on various datasets.
CVMar 19, 2021
CE-FPN: Enhancing Channel Information for Object DetectionYihao Luo, Xiang Cao, Juntao Zhang et al.
Feature pyramid network (FPN) has been an effective framework to extract multi-scale features in object detection. However, current FPN-based methods mostly suffer from the intrinsic flaw of channel reduction, which brings about the loss of semantical information. And the miscellaneous fused feature maps may cause serious aliasing effects. In this paper, we present a novel channel enhancement feature pyramid network (CE-FPN) with three simple yet effective modules to alleviate these problems. Specifically, inspired by sub-pixel convolution, we propose a sub-pixel skip fusion method to perform both channel enhancement and upsampling. Instead of the original 1x1 convolution and linear upsampling, it mitigates the information loss due to channel reduction. Then we propose a sub-pixel context enhancement module for extracting more feature representations, which is superior to other context methods due to the utilization of rich channel information by sub-pixel convolution. Furthermore, a channel attention guided module is introduced to optimize the final integrated features on each level, which alleviates the aliasing effect only with a few computational burdens. Our experiments show that CE-FPN achieves competitive performance compared to state-of-the-art FPN-based detectors on MS COCO benchmark.