Lihong Cao

CV
5papers
27citations
Novelty55%
AI Score29

5 Papers

CVApr 27, 2022Code
A Multi-Head Convolutional Neural Network With Multi-path Attention improves Image Denoising

Jiahong Zhang, Meijun Qu, Ye Wang et al.

Recently, convolutional neural networks (CNNs) and attention mechanisms have been widely used in image denoising and achieved satisfactory performance. However, the previous works mostly use a single head to receive the noisy image, limiting the richness of extracted features. Therefore, a novel CNN with multiple heads (MH) named MHCNN is proposed in this paper, whose heads will receive the input images rotated by different rotation angles. MH makes MHCNN simultaneously utilize features of rotated images to remove noise. To integrate these features effectively, we present a novel multi-path attention mechanism (MPA). Unlike previous attention mechanisms that handle pixel-level, channel-level, or patch-level features, MPA focuses on features at the image level. Experiments show MHCNN surpasses other state-of-the-art CNN models on additive white Gaussian noise (AWGN) denoising and real-world image denoising. Its peak signal-to-noise ratio (PSNR) results are higher than other networks, such as BRDNet, RIDNet, PAN-Net, and CSANN. The code is accessible at https://github.com/JiaHongZ/MHCNN.

CVMar 16, 2023Code
Extracting the Brain-like Representation by an Improved Self-Organizing Map for Image Classification

Jiahong Zhang, Lihong Cao, Moning Zhang et al.

Backpropagation-based supervised learning has achieved great success in computer vision tasks. However, its biological plausibility is always controversial. Recently, the bio-inspired Hebbian learning rule (HLR) has received extensive attention. Self-Organizing Map (SOM) uses the competitive HLR to establish connections between neurons, obtaining visual features in an unsupervised way. Although the representation of SOM neurons shows some brain-like characteristics, it is still quite different from the neuron representation in the human visual cortex. This paper proposes an improved SOM with multi-winner, multi-code, and local receptive field, named mlSOM. We observe that the neuron representation of mlSOM is similar to the human visual cortex. Furthermore, mlSOM shows a sparse distributed representation of objects, which has also been found in the human inferior temporal area. In addition, experiments show that mlSOM achieves better classification accuracy than the original SOM and other state-of-the-art HLR-based methods. The code is accessible at https://github.com/JiaHongZ/mlSOM.

CVMar 2, 2023
BIFRNet: A Brain-Inspired Feature Restoration DNN for Partially Occluded Image Recognition

Jiahong Zhang, Lihong Cao, Qiuxia Lai et al.

The partially occluded image recognition (POIR) problem has been a challenge for artificial intelligence for a long time. A common strategy to handle the POIR problem is using the non-occluded features for classification. Unfortunately, this strategy will lose effectiveness when the image is severely occluded, since the visible parts can only provide limited information. Several studies in neuroscience reveal that feature restoration which fills in the occluded information and is called amodal completion is essential for human brains to recognize partially occluded images. However, feature restoration is commonly ignored by CNNs, which may be the reason why CNNs are ineffective for the POIR problem. Inspired by this, we propose a novel brain-inspired feature restoration network (BIFRNet) to solve the POIR problem. It mimics a ventral visual pathway to extract image features and a dorsal visual pathway to distinguish occluded and visible image regions. In addition, it also uses a knowledge module to store object prior knowledge and uses a completion module to restore occluded features based on visible features and prior knowledge. Thorough experiments on synthetic and real-world occluded image datasets show that BIFRNet outperforms the existing methods in solving the POIR problem. Especially for severely occluded images, BIRFRNet surpasses other methods by a large margin and is close to the human brain performance. Furthermore, the brain-inspired design makes BIFRNet more interpretable.

CVOct 13, 2021
Adversarial Attack across Datasets

Yunxiao Qin, Yuanhao Xiong, Jinfeng Yi et al.

Existing transfer attack methods commonly assume that the attacker knows the training set (e.g., the label set, the input size) of the black-box victim models, which is usually unrealistic because in some cases the attacker cannot know this information. In this paper, we define a Generalized Transferable Attack (GTA) problem where the attacker doesn't know this information and is acquired to attack any randomly encountered images that may come from unknown datasets. To solve the GTA problem, we propose a novel Image Classification Eraser (ICE) that trains a particular attacker to erase classification information of any images from arbitrary datasets. Experiments on several datasets demonstrate that ICE greatly outperforms existing transfer attacks on GTA, and show that ICE uses similar texture-like noises to perturb different images from different datasets. Moreover, fast fourier transformation analysis indicates that the main components in each ICE noise are three sine waves for the R, G, and B image channels. Inspired by this interesting finding, we then design a novel Sine Attack (SA) method to optimize the three sine waves. Experiments show that SA performs comparably to ICE, indicating that the three sine waves are effective and enough to break DNNs under the GTA setting.

NEJul 17, 2020
A Biologically Plausible Audio-Visual Integration Model for Continual Learning

Wenjie Chen, Fengtong Du, Ye Wang et al.

The problem of catastrophic forgetting has a history of more than 30 years and has not been completely solved yet. Since the human brain has natural ability to perform continual lifelong learning, learning from the brain may provide solutions to this problem. In this paper, we propose a novel biologically plausible audio-visual integration model (AVIM) based on the assumption that the integration of audio and visual perceptual information in the medial temporal lobe during learning is crucial to form concepts and make continual learning possible. Specifically, we use multi-compartment Hodgkin-Huxley neurons to build the model and adopt the calcium-based synaptic tagging and capture as the model's learning rule. Furthermore, we define a new continual learning paradigm to simulate the possible continual learning process in the human brain. We then test our model under this new paradigm. Our experimental results show that the proposed AVIM can achieve state-of-the-art continual learning performance compared with other advanced methods such as OWM, iCaRL and GEM. Moreover, it can generate stable representations of objects during learning. These results support our assumption that concept formation is essential for continuous lifelong learning and suggest the proposed AVIM is a possible concept formation mechanism.