Zhi-Hong Mao

CV
h-index28
7papers
42citations
Novelty41%
AI Score25

7 Papers

CVAug 23, 2024
Shape-Preserving Generation of Food Images for Automatic Dietary Assessment

Guangzong Chen, Zhi-Hong Mao, Mingui Sun et al.

Traditional dietary assessment methods heavily rely on self-reporting, which is time-consuming and prone to bias. Recent advancements in Artificial Intelligence (AI) have revealed new possibilities for dietary assessment, particularly through analysis of food images. Recognizing foods and estimating food volumes from images are known as the key procedures for automatic dietary assessment. However, both procedures required large amounts of training images labeled with food names and volumes, which are currently unavailable. Alternatively, recent studies have indicated that training images can be artificially generated using Generative Adversarial Networks (GANs). Nonetheless, convenient generation of large amounts of food images with known volumes remain a challenge with the existing techniques. In this work, we present a simple GAN-based neural network architecture for conditional food image generation. The shapes of the food and container in the generated images closely resemble those in the reference input image. Our experiments demonstrate the realism of the generated images and shape-preserving capabilities of the proposed framework.

LGOct 12, 2023
A Recent Survey of Heterogeneous Transfer Learning

Runxue Bao, Yiming Sun, Yuhe Gao et al.

The application of transfer learning, leveraging knowledge from source domains to enhance model performance in a target domain, has significantly grown, supporting diverse real-world applications. Its success often relies on shared knowledge between domains, typically required in these methodologies. Commonly, methods assume identical feature and label spaces in both domains, known as homogeneous transfer learning. However, this is often impractical as source and target domains usually differ in these spaces, making precise data matching challenging and costly. Consequently, heterogeneous transfer learning (HTL), which addresses these disparities, has become a vital strategy in various tasks. In this paper, we offer an extensive review of over 60 HTL methods, covering both data-based and model-based approaches. We describe the key assumptions and algorithms of these methods and systematically categorize them into instance-based, feature representation-based, parameter regularization, and parameter tuning techniques. Additionally, we explore applications in natural language processing, computer vision, multimodal learning, and biomedicine, aiming to deepen understanding and stimulate further research in these areas. Our paper includes recent advancements in HTL, such as the introduction of transformer-based models and multimodal learning techniques, ensuring the review captures the latest developments in the field. We identify key limitations in current HTL studies and offer systematic guidance for future research, highlighting areas needing further exploration and suggesting potential directions for advancing the field.

LGAug 17, 2022
Sampling Through the Lens of Sequential Decision Making

Jason Xiaotian Dou, Alvin Qingkai Pan, Runxue Bao et al.

Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics. They cannot choose the best sample for model training in different stages. Inspired by "Think, Fast and Slow" (System 1 and System 2) in cognitive science, we propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR) to tackle this challenge. To the best of our knowledge, this is the first work utilizing reinforcement learning (RL) to address the sampling problem in representation learning. Our approach optimally adjusts the sampling process to achieve optimal performance. We explore geographical relationships among samples by distance-based sampling to maximize overall cumulative reward. We apply ASR to the long-standing sampling problems in similarity-based loss functions. Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets. We also discuss an engrossing phenomenon which we name as "ASR gravity well" in experiments.

LGAug 31, 2024
Two-Stage Hierarchical and Explainable Feature Selection Framework for Dimensionality Reduction in Sleep Staging

Yangfan Deng, Hamad Albidah, Ahmed Dallal et al.

Sleep is crucial for human health, and EEG signals play a significant role in sleep research. Due to the high-dimensional nature of EEG signal data sequences, data visualization and clustering of different sleep stages have been challenges. To address these issues, we propose a two-stage hierarchical and explainable feature selection framework by incorporating a feature selection algorithm to improve the performance of dimensionality reduction. Inspired by topological data analysis, which can analyze the structure of high-dimensional data, we extract topological features from the EEG signals to compensate for the structural information loss that happens in traditional spectro-temporal data analysis. Supported by the topological visualization of the data from different sleep stages and the classification results, the proposed features are proven to be effective supplements to traditional features. Finally, we compare the performances of three dimensionality reduction algorithms: Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). Among them, t-SNE achieved the highest accuracy of 79.8%, but considering the overall performance in terms of computational resources and metrics, UMAP is the optimal choice.

CVNov 15, 2024
Mechanisms of Generative Image-to-Image Translation Networks

Guangzong Chen, Mingui Sun, Zhi-Hong Mao et al.

Generative Adversarial Networks (GANs) are a class of neural networks that have been widely used in the field of image-to-image translation. In this paper, we propose a streamlined image-to-image translation network with a simpler architecture compared to existing models. We investigate the relationship between GANs and autoencoders and provide an explanation for the efficacy of employing only the GAN component for tasks involving image translation. We show that adversarial for GAN models yields results comparable to those of existing methods without additional complex loss penalties. Subsequently, we elucidate the rationale behind this phenomenon. We also incorporate experimental results to demonstrate the validity of our findings.

CVDec 18, 2020
Self-supervised Learning with Fully Convolutional Networks

Zhengeng Yang, Hongshan Yu, Yong He et al.

Although deep learning based methods have achieved great success in many computer vision tasks, their performance relies on a large number of densely annotated samples that are typically difficult to obtain. In this paper, we focus on the problem of learning representation from unlabeled data for semantic segmentation. Inspired by two patch-based methods, we develop a novel self-supervised learning framework by formulating the Jigsaw Puzzle problem as a patch-wise classification process and solving it with a fully convolutional network. By learning to solve a Jigsaw Puzzle problem with 25 patches and transferring the learned features to semantic segmentation task on Cityscapes dataset, we achieve a 5.8 percentage point improvement over the baseline model that initialized from random values. Moreover, experiments show that our self-supervised learning method can be applied to different datasets and models. In particular, we achieved competitive performance with the state-of-the-art methods on the PASCAL VOC2012 dataset using significant fewer training images.

CVMar 16, 2019
Real time backbone for semantic segmentation

Zhengeng Yang, Hongshan Yu, Qiang Fu et al.

The rapid development of autonomous driving in recent years presents lots of challenges for scene understanding. As an essential step towards scene understanding, semantic segmentation thus received lots of attention in past few years. Although deep learning based state-of-the-arts have achieved great success in improving the segmentation accuracy, most of them suffer from an inefficiency problem and can hardly applied to practical applications. In this paper, we systematically analyze the computation cost of Convolutional Neural Network(CNN) and found that the inefficiency of CNN is mainly caused by its wide structure rather than the deep structure. In addition, the success of pruning based model compression methods proved that there are many redundant channels in CNN. Thus, we designed a very narrow while deep backbone network to improve the efficiency of semantic segmentation. By casting our network to FCN32 segmentation architecture, the basic structure of most segmentation methods, we achieved 60.6\% mIoU on Cityscape val dataset with 54 frame per seconds(FPS) on $1024\times2048$ inputs, which already outperforms one of the earliest real time deep learning based segmentation methods: ENet.