Guihua Wen

CV
h-index6
16papers
346citations
Novelty43%
AI Score30

16 Papers

CVNov 21, 2022
Modeling Hierarchical Structural Distance for Unsupervised Domain Adaptation

Yingxue Xu, Guihua Wen, Yang Hu et al.

Unsupervised domain adaptation (UDA) aims to estimate a transferable model for unlabeled target domains by exploiting labeled source data. Optimal Transport (OT) based methods have recently been proven to be a promising solution for UDA with a solid theoretical foundation and competitive performance. However, most of these methods solely focus on domain-level OT alignment by leveraging the geometry of domains for domain-invariant features based on the global embeddings of images. However, global representations of images may destroy image structure, leading to the loss of local details that offer category-discriminative information. This study proposes an end-to-end Deep Hierarchical Optimal Transport method (DeepHOT), which aims to learn both domain-invariant and category-discriminative representations by mining hierarchical structural relations among domains. The main idea is to incorporate a domain-level OT and image-level OT into a unified OT framework, hierarchical optimal transport, to model the underlying geometry in both domain space and image space. In DeepHOT framework, an image-level OT serves as the ground distance metric for the domain-level OT, leading to the hierarchical structural distance. Compared with the ground distance of the conventional domain-level OT, the image-level OT captures structural associations among local regions of images that are beneficial to classification. In this way, DeepHOT, a unified OT framework, not only aligns domains by domain-level OT, but also enhances the discriminative power through image-level OT. Moreover, to overcome the limitation of high computational complexity, we propose a robust and efficient implementation of DeepHOT by approximating origin OT with sliced Wasserstein distance in image-level OT and accomplishing the mini-batch unbalanced domain-level OT.

LGMar 7, 2025
Partial Distribution Alignment via Adaptive Optimal Transport

Pei Yang, Qi Tan, Guihua Wen

To remedy the drawbacks of full-mass or fixed-mass constraints in classical optimal transport, we propose adaptive optimal transport which is distinctive from the classical optimal transport in its ability of adaptive-mass preserving. It aims to answer the mathematical problem of how to transport the probability mass adaptively between probability distributions, which is a fundamental topic in various areas of artificial intelligence. Adaptive optimal transport is able to transfer mass adaptively in the light of the intrinsic structure of the problem itself. The theoretical results shed light on the adaptive mechanism of mass transportation. Furthermore, we instantiate the adaptive optimal transport in machine learning application to align source and target distributions partially and adaptively by respecting the ubiquity of noises, outliers, and distribution shifts in the data. The experiment results on the domain adaptation benchmarks show that the proposed method significantly outperforms the state-of-the-art algorithms.

NEDec 27, 2021
Duck swarm algorithm: theory, numerical optimization, and applications

Mengjian Zhang, Guihua Wen

A swarm intelligence-based optimization algorithm, named Duck Swarm Algorithm (DSA), is proposed in this study, which is inspired by the searching for food sources and foraging behaviors of the duck swarm. Two rules are modeled from the finding food and foraging of the duck, which corresponds to the exploration and exploitation phases of the proposed DSA, respectively. The performance of the DSA is verified by using multiple CEC benchmark functions, where its statistical (best, mean, standard deviation, and average running-time) results are compared with seven well-known algorithms like Particle swarm optimization (PSO), Firefly algorithm (FA), Chicken swarm optimization (CSO), Grey wolf optimizer (GWO), Sine cosine algorithm (SCA), and Marine-predators algorithm (MPA), and Archimedes optimization algorithm (AOA). Moreover, the Wilcoxon rank-sum test, Friedman test, and convergence curves of the comparison results are utilized to prove the superiority of the DSA against other algorithms. The results demonstrate that DSA is a high-performance optimization method in terms of convergence speed and exploration-exploitation balance for solving the numerical optimization problems. Also, DSA is applied for the optimal design of six engineering constrained optimization problems and the node optimization deployment task of the Wireless Sensor Network (WSN). Overall, the comparison results revealed that the DSA is a promising and very competitive algorithm for solving different optimization problems.

LGJun 11, 2021
What Can Knowledge Bring to Machine Learning? -- A Survey of Low-shot Learning for Structured Data

Yang Hu, Adriane Chapman, Guihua Wen et al.

Supervised machine learning has several drawbacks that make it difficult to use in many situations. Drawbacks include: heavy reliance on massive training data, limited generalizability and poor expressiveness of high-level semantics. Low-shot Learning attempts to address these drawbacks. Low-shot learning allows the model to obtain good predictive power with very little or no training data, where structured knowledge plays a key role as a high-level semantic representation of human. This article will review the fundamental factors of low-shot learning technologies, with a focus on the operation of structured knowledge under different low-shot conditions. We also introduce other techniques relevant to low-shot learning. Finally, we point out the limitations of low-shot learning, the prospects and gaps of industrial applications, and future research directions.

CVJun 8, 2020
Graph-based Visual-Semantic Entanglement Network for Zero-shot Image Recognition

Yang Hu, Guihua Wen, Adriane Chapman et al.

Zero-shot learning uses semantic attributes to connect the search space of unseen objects. In recent years, although the deep convolutional network brings powerful visual modeling capabilities to the ZSL task, its visual features have severe pattern inertia and lack of representation of semantic relationships, which leads to severe bias and ambiguity. In response to this, we propose the Graph-based Visual-Semantic Entanglement Network to conduct graph modeling of visual features, which is mapped to semantic attributes by using a knowledge graph, it contains several novel designs: 1. it establishes a multi-path entangled network with the convolutional neural network (CNN) and the graph convolutional network (GCN), which input the visual features from CNN to GCN to model the implicit semantic relations, then GCN feedback the graph modeled information to CNN features; 2. it uses attribute word vectors as the target for the graph semantic modeling of GCN, which forms a self-consistent regression for graph modeling and supervise GCN to learn more personalized attribute relations; 3. it fuses and supplements the hierarchical visual-semantic features refined by graph modeling into visual embedding. Our method outperforms state-of-the-art approaches on multiple representative ZSL datasets: AwA2, CUB, and SUN by promoting the semantic linkage modelling of visual features.

CVMay 13, 2020
Multiple Attentional Pyramid Networks for Chinese Herbal Recognition

Yingxue Xu, Guihua Wen, Yang Hu et al.

Chinese herbs play a critical role in Traditional Chinese Medicine. Due to different recognition granularity, they can be recognized accurately only by professionals with much experience. It is expected that they can be recognized automatically using new techniques like machine learning. However, there is no Chinese herbal image dataset available. Simultaneously, there is no machine learning method which can deal with Chinese herbal image recognition well. Therefore, this paper begins with building a new standard Chinese-Herbs dataset. Subsequently, a new Attentional Pyramid Networks (APN) for Chinese herbal recognition is proposed, where both novel competitive attention and spatial collaborative attention are proposed and then applied. APN can adaptively model Chinese herbal images with different feature scales. Finally, a new framework for Chinese herbal recognition is proposed as a new application of APN. Experiments are conducted on our constructed dataset and validate the effectiveness of our methods.

CVApr 22, 2019
Inner-Imaging Networks: Put Lenses into Convolutional Structure

Yang Hu, Guihua Wen, Mingnan Luo et al.

Despite the tremendous success in computer vision, deep convolutional networks suffer from serious computation costs and redundancies. Although previous works address this issue by enhancing diversities of filters, they have not considered the complementarity and the completeness of the internal structure of the convolutional network. To deal with these problems, a novel Inner-Imaging architecture is proposed in this paper, which allows relationships between channels to meet the above requirement. Specifically, we organize the channel signal points in groups using convolutional kernels to model both the intra-group and inter-group relationships simultaneously. The convolutional filter is a powerful tool for modeling spatial relations and organizing grouped signals, so the proposed methods map the channel signals onto a pseudo-image, like putting a lens into convolution internal structure. Consequently, not only the diversity of channels is increased, but also the complementarity and completeness can be explicitly enhanced. The proposed architecture is lightweight and easy to be implemented. It provides an efficient self-organization strategy for convolutional networks so as to improve their efficiency and performance. Extensive experiments are conducted on multiple benchmark image recognition data sets including CIFAR, SVHN and ImageNet. Experimental results verify the effectiveness of the Inner-Imaging mechanism with the most popular convolutional networks as the backbones.

CVApr 22, 2019
Stochastic Region Pooling: Make Attention More Expressive

Mingnan Luo, Guihua Wen, Yang Hu et al.

Global Average Pooling (GAP) is used by default on the channel-wise attention mechanism to extract channel descriptors. However, the simple global aggregation method of GAP is easy to make the channel descriptors have homogeneity, which weakens the detail distinction between feature maps, thus affecting the performance of the attention mechanism. In this work, we propose a novel method for channel-wise attention network, called Stochastic Region Pooling (SRP), which makes the channel descriptors more representative and diversity by encouraging the feature map to have more or wider important feature responses. Also, SRP is the general method for the attention mechanisms without any additional parameters or computation. It can be widely applied to attention networks without modifying the network structure. Experimental results on image recognition datasets including CIAFR-10/100, ImageNet and three Fine-grained datasets (CUB-200-2011, Stanford Cars and Stanford Dogs) show that SRP brings the significant improvements of the performance over efficient CNNs and achieves the state-of-the-art results.

CVDec 23, 2018
Chinese Herbal Recognition based on Competitive Attentional Fusion of Multi-hierarchies Pyramid Features

Yingxue Xu, Guihua Wen, Yang Hu et al.

Convolution neural netwotks (CNNs) are successfully applied in image recognition task. In this study, we explore the approach of automatic herbal recognition with CNNs and build the standard Chinese herbs datasets firstly. According to the characteristics of herbal images, we proposed the competitive attentional fusion pyramid networks to model the features of herbal image, which mdoels the relationship of feature maps from different levels, and re-weights multi-level channels with channel-wise attention mechanism. In this way, we can dynamically adjust the weight of feature maps from various layers, according to the visual characteristics of each herbal image. Moreover, we also introduce the spatial attention to recalibrate the misaligned features caused by sampling in features amalgamation. Extensive experiments are conducted on our proposed datasets and validate the superior performance of our proposed models. The Chinese herbs datasets will be released upon acceptance to facilitate the research of Chinese herbal recognition.

CVDec 17, 2018
Convolutional herbal prescription building method from multi-scale facial features

Huiqiang Liao, Guihua Wen, Yang Hu et al.

In Traditional Chinese Medicine (TCM), facial features are important basis for diagnosis and treatment. A doctor of TCM can prescribe according to a patient's physical indicators such as face, tongue, voice, symptoms, pulse. Previous works analyze and generate prescription according to symptoms. However, research work to mine the association between facial features and prescriptions has not been found for the time being. In this work, we try to use deep learning methods to mine the relationship between the patient's face and herbal prescriptions (TCM prescriptions), and propose to construct convolutional neural networks that generate TCM prescriptions according to the patient's face image. It is a novel and challenging job. In order to mine features from different granularities of faces, we design a multi-scale convolutional neural network based on three-grained face, which mines the patient's face information from the organs, local regions, and the entire face. Our experiments show that convolutional neural networks can learn relevant information from face to prescribe, and the multi-scale convolutional neural networks based on three-grained face perform better.

CVJul 24, 2018
Competitive Inner-Imaging Squeeze and Excitation for Residual Network

Yang Hu, Guihua Wen, Mingnan Luo et al.

Residual networks, which use a residual unit to supplement the identity mappings, enable very deep convolutional architecture to operate well, however, the residual architecture has been proved to be diverse and redundant, which may leads to low-efficient modeling. In this work, we propose a competitive squeeze-excitation (SE) mechanism for the residual network. Re-scaling the value for each channel in this structure will be determined by the residual and identity mappings jointly, and this design enables us to expand the meaning of channel relationship modeling in residual blocks. Modeling of the competition between residual and identity mappings cause the identity flow to control the complement of the residual feature maps for itself. Furthermore, we design a novel inner-imaging competitive SE block to shrink the consumption and re-image the global features of intermediate network structure, by using the inner-imaging mechanism, we can model the channel-wise relations with convolution in spatial. We carry out experiments on the CIFAR, SVHN, and ImageNet datasets, and the proposed method can challenge state-of-the-art results.

CVMar 1, 2018
Tongue image constitution recognition based on Complexity Perception method

Jiajiong Ma, Guihua Wen, Yang Hu et al.

Background and Object: In China, body constitution is highly related to physiological and pathological functions of human body and determines the tendency of the disease, which is of great importance for treatment in clinical medicine. Tongue diagnosis, as a key part of Traditional Chinese Medicine inspection, is an important way to recognize the type of constitution.In order to deploy tongue image constitution recognition system on non-invasive mobile device to achieve fast, efficient and accurate constitution recognition, an efficient method is required to deal with the challenge of this kind of complex environment. Methods: In this work, we perform the tongue area detection, tongue area calibration and constitution classification using methods which are based on deep convolutional neural network. Subject to the variation of inconstant environmental condition, the distribution of the picture is uneven, which has a bad effect on classification performance. To solve this problem, we propose a method based on the complexity of individual instances to divide dataset into two subsets and classify them separately, which is capable of improving classification accuracy. To evaluate the performance of our proposed method, we conduct experiments on three sizes of tongue datasets, in which deep convolutional neural network method and traditional digital image analysis method are respectively applied to extract features for tongue images. The proposed method is combined with the base classifier Softmax, SVM, and DecisionTree respectively. Results: As the experiments results shown, our proposed method improves the classification accuracy by 1.135% on average and achieves 59.99% constitution classification accuracy. Conclusions: Experimental results on three datasets show that our proposed method can effectively improve the classification accuracy of tongue constitution recognition.

CVMar 1, 2018
Facial Expression Recognition Based on Complexity Perception Classification Algorithm

Tianyuan Chang, Guihua Wen, Yang Hu et al.

Facial expression recognition (FER) has always been a challenging issue in computer vision. The different expressions of emotion and uncontrolled environmental factors lead to inconsistencies in the complexity of FER and variability of between expression categories, which is often overlooked in most facial expression recognition systems. In order to solve this problem effectively, we presented a simple and efficient CNN model to extract facial features, and proposed a complexity perception classification (CPC) algorithm for FER. The CPC algorithm divided the dataset into an easy classification sample subspace and a complex classification sample subspace by evaluating the complexity of facial features that are suitable for classification. The experimental results of our proposed algorithm on Fer2013 and CK-plus datasets demonstrated the algorithm's effectiveness and superiority over other state-of-the-art approaches.

CVJan 23, 2018
Automatic construction of Chinese herbal prescription from tongue image via CNNs and auxiliary latent therapy topics

Yang Hu, Guihua Wen, Huiqiang Liao et al.

The tongue image provides important physical information of humans. It is of great importance for diagnoses and treatments in clinical medicine. Herbal prescriptions are simple, noninvasive and have low side effects. Thus, they are widely applied in China. Studies on the automatic construction technology of herbal prescriptions based on tongue images have great significance for deep learning to explore the relevance of tongue images for herbal prescriptions, it can be applied to healthcare services in mobile medical systems. In order to adapt to the tongue image in a variety of photographic environments and construct herbal prescriptions, a neural network framework for prescription construction is designed. It includes single/double convolution channels and fully connected layers. Furthermore, it proposes the auxiliary therapy topic loss mechanism to model the therapy of Chinese doctors and alleviate the interference of sparse output labels on the diversity of results. The experiment use the real world tongue images and the corresponding prescriptions and the results can generate prescriptions that are close to the real samples, which verifies the feasibility of the proposed method for the automatic construction of herbal prescriptions from tongue images. Also, it provides a reference for automatic herbal prescription construction from more physical information.

CLApr 7, 2017
Conceptualization Topic Modeling

Yi-Kun Tang, Xian-Ling Mao, Heyan Huang et al.

Recently, topic modeling has been widely used to discover the abstract topics in text corpora. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a probability distribution over topics, and each topic is a probability distribution over words. However, the assumption is not optimal. Intuitively, it's more reasonable to assume that each topic is a probability distribution over concepts, and then each concept is a probability distribution over words, i.e. adding a latent concept layer between topic layer and word layer in traditional three-layer assumption. In this paper, we verify the proposed assumption by incorporating the new assumption in two representative topic models, and obtain two novel topic models. Extensive experiments were conducted among the proposed models and corresponding baselines, and the results show that the proposed models significantly outperform the baselines in terms of case study and perplexity, which means the new assumption is more reasonable than traditional one.

CVApr 7, 2017
Supervised Deep Hashing for Hierarchical Labeled Data

Dan Wang, Heyan Huang, Chi Lu et al.

Recently, hashing methods have been widely used in large-scale image retrieval. However, most existing hashing methods did not consider the hierarchical relation of labels, which means that they ignored the rich information stored in the hierarchy. Moreover, most of previous works treat each bit in a hash code equally, which does not meet the scenario of hierarchical labeled data. In this paper, we propose a novel deep hashing method, called supervised hierarchical deep hashing (SHDH), to perform hash code learning for hierarchical labeled data. Specifically, we define a novel similarity formula for hierarchical labeled data by weighting each layer, and design a deep convolutional neural network to obtain a hash code for each data point. Extensive experiments on several real-world public datasets show that the proposed method outperforms the state-of-the-art baselines in the image retrieval task.