Yang Hu

CV
h-index27
25papers
616citations
Novelty45%
AI Score39

25 Papers

4.8CVNov 21, 2022
Modeling Hierarchical Structural Distance for Unsupervised Domain Adaptation

Yingxue Xu, Guihua Wen, Yang Hu et al.

Unsupervised domain adaptation (UDA) aims to estimate a transferable model for unlabeled target domains by exploiting labeled source data. Optimal Transport (OT) based methods have recently been proven to be a promising solution for UDA with a solid theoretical foundation and competitive performance. However, most of these methods solely focus on domain-level OT alignment by leveraging the geometry of domains for domain-invariant features based on the global embeddings of images. However, global representations of images may destroy image structure, leading to the loss of local details that offer category-discriminative information. This study proposes an end-to-end Deep Hierarchical Optimal Transport method (DeepHOT), which aims to learn both domain-invariant and category-discriminative representations by mining hierarchical structural relations among domains. The main idea is to incorporate a domain-level OT and image-level OT into a unified OT framework, hierarchical optimal transport, to model the underlying geometry in both domain space and image space. In DeepHOT framework, an image-level OT serves as the ground distance metric for the domain-level OT, leading to the hierarchical structural distance. Compared with the ground distance of the conventional domain-level OT, the image-level OT captures structural associations among local regions of images that are beneficial to classification. In this way, DeepHOT, a unified OT framework, not only aligns domains by domain-level OT, but also enhances the discriminative power through image-level OT. Moreover, to overcome the limitation of high computational complexity, we propose a robust and efficient implementation of DeepHOT by approximating origin OT with sliced Wasserstein distance in image-level OT and accomplishing the mini-batch unbalanced domain-level OT.

1.2QMSep 7, 2023
Beyond attention: deriving biologically interpretable insights from weakly-supervised multiple-instance learning models

Willem Bonnaffé, CRUK ICGC Prostate Group, Freddie Hamdy et al.

Recent advances in attention-based multiple instance learning (MIL) have improved our insights into the tissue regions that models rely on to make predictions in digital pathology. However, the interpretability of these approaches is still limited. In particular, they do not report whether high-attention regions are positively or negatively associated with the class labels or how well these regions correspond to previously established clinical and biological knowledge. We address this by introducing a post-training methodology to analyse MIL models. Firstly, we introduce prediction-attention-weighted (PAW) maps by combining tile-level attention and prediction scores produced by a refined encoder, allowing us to quantify the predictive contribution of high-attention regions. Secondly, we introduce a biological feature instantiation technique by integrating PAW maps with nuclei segmentation masks. This further improves interpretability by providing biologically meaningful features related to the cellular organisation of the tissue and facilitates comparisons with known clinical features. We illustrate the utility of our approach by comparing PAW maps obtained for prostate cancer diagnosis (i.e. samples containing malignant tissue, 381/516 tissue samples) and prognosis (i.e. samples from patients with biochemical recurrence following surgery, 98/663 tissue samples) in a cohort of patients from the international cancer genome consortium (ICGC UK Prostate Group). Our approach reveals that regions that are predictive of adverse prognosis do not tend to co-locate with the tumour regions, indicating that non-cancer cells should also be studied when evaluating prognosis.

4.9CLOct 14, 2025
MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts

Yushu Zhao, Yubin Qin, Yang Wang et al.

Mixture-of-Experts (MoE) models have recently demonstrated exceptional performance across a diverse range of applications. The principle of sparse activation in MoE models facilitates an offloading strategy, wherein active experts are maintained in GPU HBM, while inactive experts are stored in CPU DRAM. The efficacy of this approach, however, is fundamentally constrained by the limited bandwidth of the CPU-GPU interconnect. To mitigate this bottleneck, existing approaches have employed prefetching to accelerate MoE inference. These methods attempt to predict and prefetch the required experts using specially trained modules. Nevertheless, such techniques are often encumbered by significant training overhead and have shown diminished effectiveness on recent MoE models with fine-grained expert segmentation. In this paper, we propose MoBiLE, a plug-and-play offloading-based MoE inference framework with \textit{mixture of big-little experts}. It reduces the number of experts for unimportant tokens to half for acceleration while maintaining full experts for important tokens to guarantee model quality. Further, a dedicated fallback and prefetching mechanism is designed for switching between little and big experts to improve memory efficiency. We evaluate MoBiLE on four typical modern MoE architectures and challenging generative tasks. Our results show that MoBiLE achieves a speedup of 1.60x to 1.72x compared to the baseline on a consumer GPU system, with negligible degradation in accuracy.

3.7CVDec 28, 2024
MADiff: Text-Guided Fashion Image Editing with Mask Prediction and Attention-Enhanced Diffusion

Zechao Zhan, Dehong Gao, Jinxia Zhang et al.

Text-guided image editing model has achieved great success in general domain. However, directly applying these models to the fashion domain may encounter two issues: (1) Inaccurate localization of editing region; (2) Weak editing magnitude. To address these issues, the MADiff model is proposed. Specifically, to more accurately identify editing region, the MaskNet is proposed, in which the foreground region, densepose and mask prompts from large language model are fed into a lightweight UNet to predict the mask for editing region. To strengthen the editing magnitude, the Attention-Enhanced Diffusion Model is proposed, where the noise map, attention map, and the mask from MaskNet are fed into the proposed Attention Processor to produce a refined noise map. By integrating the refined noise map into the diffusion model, the edited image can better align with the target prompt. Given the absence of benchmarks in fashion image editing, we constructed a dataset named Fashion-E, comprising 28390 image-text pairs in the training set, and 2639 image-text pairs for four types of fashion tasks in the evaluation set. Extensive experiments on Fashion-E demonstrate that our proposed method can accurately predict the mask of editing region and significantly enhance editing magnitude in fashion image editing compared to the state-of-the-art methods.

3.7CVDec 28, 2024
FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training

Jiale Huang, Dehong Gao, Jinxia Zhang et al.

Large-scale Vision-Language Pre-training (VLP) has demonstrated remarkable success in the general domain. However, in the fashion domain, items are distinguished by fine-grained attributes like texture and material, which are crucial for tasks such as retrieval. Existing models often fail to leverage these fine-grained attributes from both text and image modalities. To address the above issues, we propose a novel approach for the fashion domain, Fine-grained Attributes Enhanced VLP (FashionFAE), which focuses on the detailed characteristics of fashion data. An attribute-emphasized text prediction task is proposed to predict fine-grained attributes of the items. This forces the model to focus on the salient attributes from the text modality. Additionally, a novel attribute-promoted image reconstruction task is proposed, which further enhances the fine-grained ability of the model by leveraging the representative attributes from the image modality. Extensive experiments show that FashionFAE significantly outperforms State-Of-The-Art (SOTA) methods, achieving 2.9% and 5.2% improvements in retrieval on sub-test and full test sets, respectively, and a 1.6% average improvement in recognition tasks.

8.8CVJan 25, 2022
RFMask: A Simple Baseline for Human Silhouette Segmentation with Radio Signals

Zhi Wu, Dongheng Zhang, Chunyang Xie et al.

Human silhouette segmentation, which is originally defined in computer vision, has achieved promising results for understanding human activities. However, the physical limitation makes existing systems based on optical cameras suffer from severe performance degradation under low illumination, smoke, and/or opaque obstruction conditions. To overcome such limitations, in this paper, we propose to utilize the radio signals, which can traverse obstacles and are unaffected by the lighting conditions to achieve silhouette segmentation. The proposed RFMask framework is composed of three modules. It first transforms RF signals captured by millimeter wave radar on two planes into spatial domain and suppress interference with the signal processing module. Then, it locates human reflections on RF frames and extract features from surrounding signals with human detection module. Finally, the extracted features from RF frames are aggregated with an attention based mask generation module. To verify our proposed framework, we collect a dataset containing 804,760 radio frames and 402,380 camera frames with human activities under various scenes. Experimental results show that the proposed framework can achieve impressive human silhouette segmentation even under the challenging scenarios(such as low light and occlusion scenarios) where traditional optical-camera-based methods fail. To the best of our knowledge, this is the first investigation towards segmenting human silhouette based on millimeter wave signals. We hope that our work can serve as a baseline and inspire further research that perform vision tasks with radio signals. The dataset and codes will be made in public.

3.7CVAug 25, 2021
Heredity-aware Child Face Image Generation with Latent Space Disentanglement

Xiao Cui, Wengang Zhou, Yang Hu et al.

Generative adversarial networks have been widely used in image synthesis in recent years and the quality of the generated image has been greatly improved. However, the flexibility to control and decouple facial attributes (e.g., eyes, nose, mouth) is still limited. In this paper, we propose a novel approach, called ChildGAN, to generate a child's image according to the images of parents with heredity prior. The main idea is to disentangle the latent space of a pre-trained generation model and precisely control the face attributes of child images with clear semantics. We use distances between face landmarks as pseudo labels to figure out the most influential semantic vectors of the corresponding face attributes by calculating the gradient of latent vectors to pseudo labels. Furthermore, we disentangle the semantic vectors by weighting irrelevant features and orthogonalizing them with Schmidt Orthogonalization. Finally, we fuse the latent vector of the parents by leveraging the disentangled semantic vectors under the guidance of biological genetic laws. Extensive experiments demonstrate that our approach outperforms the existing methods with encouraging results.

9.2LGJun 11, 2021
What Can Knowledge Bring to Machine Learning? -- A Survey of Low-shot Learning for Structured Data

Yang Hu, Adriane Chapman, Guihua Wen et al.

Supervised machine learning has several drawbacks that make it difficult to use in many situations. Drawbacks include: heavy reliance on massive training data, limited generalizability and poor expressiveness of high-level semantics. Low-shot Learning attempts to address these drawbacks. Low-shot learning allows the model to obtain good predictive power with very little or no training data, where structured knowledge plays a key role as a high-level semantic representation of human. This article will review the fundamental factors of low-shot learning technologies, with a focus on the operation of structured knowledge under different low-shot conditions. We also introduce other techniques relevant to low-shot learning. Finally, we point out the limitations of low-shot learning, the prospects and gaps of industrial applications, and future research directions.

9.6CVJun 8, 2020
Graph-based Visual-Semantic Entanglement Network for Zero-shot Image Recognition

Yang Hu, Guihua Wen, Adriane Chapman et al.

Zero-shot learning uses semantic attributes to connect the search space of unseen objects. In recent years, although the deep convolutional network brings powerful visual modeling capabilities to the ZSL task, its visual features have severe pattern inertia and lack of representation of semantic relationships, which leads to severe bias and ambiguity. In response to this, we propose the Graph-based Visual-Semantic Entanglement Network to conduct graph modeling of visual features, which is mapped to semantic attributes by using a knowledge graph, it contains several novel designs: 1. it establishes a multi-path entangled network with the convolutional neural network (CNN) and the graph convolutional network (GCN), which input the visual features from CNN to GCN to model the implicit semantic relations, then GCN feedback the graph modeled information to CNN features; 2. it uses attribute word vectors as the target for the graph semantic modeling of GCN, which forms a self-consistent regression for graph modeling and supervise GCN to learn more personalized attribute relations; 3. it fuses and supplements the hierarchical visual-semantic features refined by graph modeling into visual embedding. Our method outperforms state-of-the-art approaches on multiple representative ZSL datasets: AwA2, CUB, and SUN by promoting the semantic linkage modelling of visual features.

1.2CVMay 13, 2020
Multiple Attentional Pyramid Networks for Chinese Herbal Recognition

Yingxue Xu, Guihua Wen, Yang Hu et al.

Chinese herbs play a critical role in Traditional Chinese Medicine. Due to different recognition granularity, they can be recognized accurately only by professionals with much experience. It is expected that they can be recognized automatically using new techniques like machine learning. However, there is no Chinese herbal image dataset available. Simultaneously, there is no machine learning method which can deal with Chinese herbal image recognition well. Therefore, this paper begins with building a new standard Chinese-Herbs dataset. Subsequently, a new Attentional Pyramid Networks (APN) for Chinese herbal recognition is proposed, where both novel competitive attention and spatial collaborative attention are proposed and then applied. APN can adaptively model Chinese herbal images with different feature scales. Finally, a new framework for Chinese herbal recognition is proposed as a new application of APN. Experiments are conducted on our constructed dataset and validate the effectiveness of our methods.

1.2CVApr 19, 2020
Data Augmentation Imbalance For Imbalanced Attribute Classification

Yang Hu, Xiaying Bai, Pan Zhou et al.

Pedestrian attribute recognition is an important multi-label classification problem. Although the convolutional neural networks are prominent in learning discriminative features from images, the data imbalance in multi-label setting for fine-grained tasks remains an open problem. In this paper, we propose a new re-sampling algorithm called: data augmentation imbalance (DAI) to explicitly enhance the ability to discriminate the fewer attributes via increasing the proportion of labels accounting for a small part. Fundamentally, by applying over-sampling and under-sampling on the multi-label dataset at the same time, the thought of robbing the rich attributes and helping the poor makes a significant contribution to DAI. Extensive empirical evidence shows that our DAI algorithm achieves state-of-the-art results, based on pedestrian attribute datasets, i.e. standard PA-100K and PETA datasets.

0.9CVApr 22, 2019Code
Inner-Imaging Networks: Put Lenses into Convolutional Structure

Yang Hu, Guihua Wen, Mingnan Luo et al.

Despite the tremendous success in computer vision, deep convolutional networks suffer from serious computation costs and redundancies. Although previous works address this issue by enhancing diversities of filters, they have not considered the complementarity and the completeness of the internal structure of the convolutional network. To deal with these problems, a novel Inner-Imaging architecture is proposed in this paper, which allows relationships between channels to meet the above requirement. Specifically, we organize the channel signal points in groups using convolutional kernels to model both the intra-group and inter-group relationships simultaneously. The convolutional filter is a powerful tool for modeling spatial relations and organizing grouped signals, so the proposed methods map the channel signals onto a pseudo-image, like putting a lens into convolution internal structure. Consequently, not only the diversity of channels is increased, but also the complementarity and completeness can be explicitly enhanced. The proposed architecture is lightweight and easy to be implemented. It provides an efficient self-organization strategy for convolutional networks so as to improve their efficiency and performance. Extensive experiments are conducted on multiple benchmark image recognition data sets including CIFAR, SVHN and ImageNet. Experimental results verify the effectiveness of the Inner-Imaging mechanism with the most popular convolutional networks as the backbones.

3.4CVApr 22, 2019
Stochastic Region Pooling: Make Attention More Expressive

Mingnan Luo, Guihua Wen, Yang Hu et al.

Global Average Pooling (GAP) is used by default on the channel-wise attention mechanism to extract channel descriptors. However, the simple global aggregation method of GAP is easy to make the channel descriptors have homogeneity, which weakens the detail distinction between feature maps, thus affecting the performance of the attention mechanism. In this work, we propose a novel method for channel-wise attention network, called Stochastic Region Pooling (SRP), which makes the channel descriptors more representative and diversity by encouraging the feature map to have more or wider important feature responses. Also, SRP is the general method for the attention mechanisms without any additional parameters or computation. It can be widely applied to attention networks without modifying the network structure. Experimental results on image recognition datasets including CIAFR-10/100, ImageNet and three Fine-grained datasets (CUB-200-2011, Stanford Cars and Stanford Dogs) show that SRP brings the significant improvements of the performance over efficient CNNs and achieves the state-of-the-art results.

0.9CVDec 23, 2018
Chinese Herbal Recognition based on Competitive Attentional Fusion of Multi-hierarchies Pyramid Features

Yingxue Xu, Guihua Wen, Yang Hu et al.

Convolution neural netwotks (CNNs) are successfully applied in image recognition task. In this study, we explore the approach of automatic herbal recognition with CNNs and build the standard Chinese herbs datasets firstly. According to the characteristics of herbal images, we proposed the competitive attentional fusion pyramid networks to model the features of herbal image, which mdoels the relationship of feature maps from different levels, and re-weights multi-level channels with channel-wise attention mechanism. In this way, we can dynamically adjust the weight of feature maps from various layers, according to the visual characteristics of each herbal image. Moreover, we also introduce the spatial attention to recalibrate the misaligned features caused by sampling in features amalgamation. Extensive experiments are conducted on our proposed datasets and validate the superior performance of our proposed models. The Chinese herbs datasets will be released upon acceptance to facilitate the research of Chinese herbal recognition.

1.7CVDec 17, 2018
Convolutional herbal prescription building method from multi-scale facial features

Huiqiang Liao, Guihua Wen, Yang Hu et al.

In Traditional Chinese Medicine (TCM), facial features are important basis for diagnosis and treatment. A doctor of TCM can prescribe according to a patient's physical indicators such as face, tongue, voice, symptoms, pulse. Previous works analyze and generate prescription according to symptoms. However, research work to mine the association between facial features and prescriptions has not been found for the time being. In this work, we try to use deep learning methods to mine the relationship between the patient's face and herbal prescriptions (TCM prescriptions), and propose to construct convolutional neural networks that generate TCM prescriptions according to the patient's face image. It is a novel and challenging job. In order to mine features from different granularities of faces, we design a multi-scale convolutional neural network based on three-grained face, which mines the patient's face information from the organs, local regions, and the entire face. Our experiments show that convolutional neural networks can learn relevant information from face to prescribe, and the multi-scale convolutional neural networks based on three-grained face perform better.

5.6AIOct 12, 2018
Feature Learning for Fault Detection in High-Dimensional Condition-Monitoring Signals

Gabriel Michau, Yang Hu, Thomas Palmé et al.

Complex industrial systems are continuously monitored by a large number of heterogeneous sensors. The diversity of their operating conditions and the possible fault types make it impossible to collect enough data for learning all the possible fault patterns. The paper proposes an integrated automatic unsupervised feature learning and one-class classification for fault detection that uses data on healthy conditions only for its training. The approach is based on stacked Extreme Learning Machines (namely Hierarchical, or HELM) and comprises an autoencoder, performing unsupervised feature learning, stacked with a one-class classifier monitoring the distance of the test data to the training healthy class, thereby assessing the health of the system. This study provides a comprehensive evaluation of HELM fault detection capability compared to other machine learning approaches, such as stand-alone one-class classifiers (ELM and SVM), these same one-class classifiers combined with traditional dimensionality reduction methods (PCA) and a Deep Belief Network. The performance is first evaluated on a synthetic dataset that encompasses typical characteristics of condition monitoring data. Subsequently, the approach is evaluated on a real case study of a power plant fault. The proposed algorithm for fault detection, combining feature learning with the one-class classifier, demonstrates a better performance, particularly in cases where condition monitoring data contain several non-informative signals.

6.3CVOct 4, 2018
FashionNet: Personalized Outfit Recommendation with Deep Neural Network

Tong He, Yang Hu

With the rapid growth of fashion-focused social networks and online shopping, intelligent fashion recommendation is now in great need. We design algorithms which automatically suggest users outfits (e.g. a shirt, together with a skirt and a pair of high-heel shoes), that fit their personal fashion preferences. Recommending sets, each of which is composed of multiple interacted items, is relatively new to recommender systems, which usually recommend individual items to users. We explore the use of deep networks for this challenging task. Our system, dubbed FashionNet, consists of two components, a feature network for feature extraction and a matching network for compatibility computation. The former is achieved through a deep convolutional network. And for the latter, we adopt a multi-layer fully-connected network structure. We design and compare three alternative architectures for FashionNet. To achieve personalized recommendation, we develop a two-stage training strategy, which uses the fine-tuning technique to transfer a general compatibility model to a model that embeds personal preference. Experiments on a large scale data set collected from a popular fashion-focused social network validate the effectiveness of the proposed networks.

1.8NESep 4, 2018
Metabolize Neural Network

Dan Dai, Zhiwen Yu, Yang Hu et al.

The metabolism of cells is the most basic and important part of human function. Neural networks in deep learning stem from neuronal activity. It is self-evident that the significance of metabolize neuronal network(MetaNet) in model construction. In this study, we explore neuronal metabolism for shallow network from proliferation and autophagy two aspects. First, we propose different neuron proliferate methods that constructive the selfgrowing network in metabolism cycle. Proliferate neurons alleviate resources wasting and insufficient model learning problem when network initializes more or less parameters. Then combined with autophagy mechanism in the process of model self construction to ablate under-expressed neurons. The MetaNet can automatically determine the number of neurons during training, further, save more resource consumption. We verify the performance of the proposed methods on datasets: MNIST, Fashion-MNIST and CIFAR-10.

9.1CVJul 24, 2018Code
Competitive Inner-Imaging Squeeze and Excitation for Residual Network

Yang Hu, Guihua Wen, Mingnan Luo et al.

Residual networks, which use a residual unit to supplement the identity mappings, enable very deep convolutional architecture to operate well, however, the residual architecture has been proved to be diverse and redundant, which may leads to low-efficient modeling. In this work, we propose a competitive squeeze-excitation (SE) mechanism for the residual network. Re-scaling the value for each channel in this structure will be determined by the residual and identity mappings jointly, and this design enables us to expand the meaning of channel relationship modeling in residual blocks. Modeling of the competition between residual and identity mappings cause the identity flow to control the complement of the residual feature maps for itself. Furthermore, we design a novel inner-imaging competitive SE block to shrink the consumption and re-image the global features of intermediate network structure, by using the inner-imaging mechanism, we can model the channel-wise relations with convolution in spatial. We carry out experiments on the CIFAR, SVHN, and ImageNet datasets, and the proposed method can challenge state-of-the-art results.

0.9CVMar 1, 2018
Tongue image constitution recognition based on Complexity Perception method

Jiajiong Ma, Guihua Wen, Yang Hu et al.

Background and Object: In China, body constitution is highly related to physiological and pathological functions of human body and determines the tendency of the disease, which is of great importance for treatment in clinical medicine. Tongue diagnosis, as a key part of Traditional Chinese Medicine inspection, is an important way to recognize the type of constitution.In order to deploy tongue image constitution recognition system on non-invasive mobile device to achieve fast, efficient and accurate constitution recognition, an efficient method is required to deal with the challenge of this kind of complex environment. Methods: In this work, we perform the tongue area detection, tongue area calibration and constitution classification using methods which are based on deep convolutional neural network. Subject to the variation of inconstant environmental condition, the distribution of the picture is uneven, which has a bad effect on classification performance. To solve this problem, we propose a method based on the complexity of individual instances to divide dataset into two subsets and classify them separately, which is capable of improving classification accuracy. To evaluate the performance of our proposed method, we conduct experiments on three sizes of tongue datasets, in which deep convolutional neural network method and traditional digital image analysis method are respectively applied to extract features for tongue images. The proposed method is combined with the base classifier Softmax, SVM, and DecisionTree respectively. Results: As the experiments results shown, our proposed method improves the classification accuracy by 1.135% on average and achieves 59.99% constitution classification accuracy. Conclusions: Experimental results on three datasets show that our proposed method can effectively improve the classification accuracy of tongue constitution recognition.

4.6CVMar 1, 2018
Facial Expression Recognition Based on Complexity Perception Classification Algorithm

Tianyuan Chang, Guihua Wen, Yang Hu et al.

Facial expression recognition (FER) has always been a challenging issue in computer vision. The different expressions of emotion and uncontrolled environmental factors lead to inconsistencies in the complexity of FER and variability of between expression categories, which is often overlooked in most facial expression recognition systems. In order to solve this problem effectively, we presented a simple and efficient CNN model to extract facial features, and proposed a complexity perception classification (CPC) algorithm for FER. The CPC algorithm divided the dataset into an easy classification sample subspace and a complex classification sample subspace by evaluating the complexity of facial features that are suitable for classification. The experimental results of our proposed algorithm on Fer2013 and CK-plus datasets demonstrated the algorithm's effectiveness and superiority over other state-of-the-art approaches.

4.6CVJan 23, 2018
Automatic construction of Chinese herbal prescription from tongue image via CNNs and auxiliary latent therapy topics

Yang Hu, Guihua Wen, Huiqiang Liao et al.

The tongue image provides important physical information of humans. It is of great importance for diagnoses and treatments in clinical medicine. Herbal prescriptions are simple, noninvasive and have low side effects. Thus, they are widely applied in China. Studies on the automatic construction technology of herbal prescriptions based on tongue images have great significance for deep learning to explore the relevance of tongue images for herbal prescriptions, it can be applied to healthcare services in mobile medical systems. In order to adapt to the tongue image in a variety of photographic environments and construct herbal prescriptions, a neural network framework for prescription construction is designed. It includes single/double convolution channels and fully connected layers. Furthermore, it proposes the auxiliary therapy topic loss mechanism to model the therapy of Chinese doctors and alleviate the interference of sparse output labels on the diversity of results. The experiment use the real world tongue images and the corresponding prescriptions and the results can generate prescriptions that are close to the real samples, which verifies the feasibility of the proposed method for the automatic construction of herbal prescriptions from tongue images. Also, it provides a reference for automatic herbal prescription construction from more physical information.

3.1CVJun 19, 2017
Endoscopic Depth Measurement and Super-Spectral-Resolution Imaging

Jianyu Lin, Neil T. Clancy, Yang Hu et al.

Intra-operative measurements of tissue shape and multi/ hyperspectral information have the potential to provide surgical guidance and decision making support. We report an optical probe based system to combine sparse hyperspectral measurements and spectrally-encoded structured lighting (SL) for surface measurements. The system provides informative signals for navigation with a surgical interface. By rapidly switching between SL and white light (WL) modes, SL information is combined with structure-from-motion (SfM) from white light images, based on SURF feature detection and Lucas-Kanade (LK) optical flow to provide quasi-dense surface shape reconstruction with known scale in real-time. Furthermore, "super-spectral-resolution" was realized, whereby the RGB images and sparse hyperspectral data were integrated to recover dense pixel-level hyperspectral stacks, by using convolutional neural networks to upscale the wavelength dimension. Validation and demonstration of this system is reported on ex vivo/in vivo animal/ human experiments.

14.1CVAug 5, 2014
Open-set Person Re-identification

Shengcai Liao, Zhipeng Mo, Jianqing Zhu et al.

Person re-identification is becoming a hot research for developing both machine learning algorithms and video surveillance applications. The task of person re-identification is to determine which person in a gallery has the same identity to a probe image. This task basically assumes that the subject of the probe image belongs to the gallery, that is, the gallery contains this person. However, in practical applications such as searching a suspect in a video, this assumption is usually not true. In this paper, we consider the open-set person re-identification problem, which includes two sub-tasks, detection and identification. The detection sub-task is to determine the presence of the probe subject in the gallery, and the identification sub-task is to determine which person in the gallery has the same identity as the accepted probe. We present a database collected from a video surveillance setting of 6 cameras, with 200 persons and 7,413 images segmented. Based on this database, we develop a benchmark protocol for evaluating the performance under the open-set person re-identification scenario. Several popular metric learning algorithms for person re-identification have been evaluated as baselines. From the baseline performance, we observe that the open-set person re-identification problem is still largely unresolved, thus further attention and effort is needed.

43.8CVJun 17, 2014
Person Re-identification by Local Maximal Occurrence Representation and Metric Learning

Shengcai Liao, Yang Hu, Xiangyu Zhu et al.

Person re-identification is an important technique towards automatic search of a person's presence in a surveillance video. Two fundamental problems are critical for person re-identification, feature representation and metric learning. An effective feature representation should be robust to illumination and viewpoint changes, and a discriminant metric should be learned to match various person images. In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). The LOMO feature analyzes the horizontal occurrence of local features, and maximizes the occurrence to make a stable representation against viewpoint changes. Besides, to handle illumination variations, we apply the Retinex transform and a scale invariant texture operator. To learn a discriminant metric, we propose to learn a discriminant low dimensional subspace by cross-view quadratic discriminant analysis, and simultaneously, a QDA metric is learned on the derived subspace. We also present a practical computation method for XQDA, as well as its regularization. Experiments on four challenging person re-identification databases, VIPeR, QMUL GRID, CUHK Campus, and CUHK03, show that the proposed method improves the state-of-the-art rank-1 identification rates by 2.2%, 4.88%, 28.91%, and 31.55% on the four databases, respectively.