Huali Xu

CV
h-index12
6papers
105citations
Novelty43%
AI Score30

6 Papers

CVMar 15, 2023
Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey

Huali Xu, Shuaifeng Zhi, Shuzhou Sun et al.

While deep learning excels in computer vision tasks with abundant labeled data, its performance diminishes significantly in scenarios with limited labeled samples. To address this, Few-shot learning (FSL) enables models to perform the target tasks with very few labeled examples by leveraging prior knowledge from related tasks. However, traditional FSL assumes that both the related and target tasks come from the same domain, which is a restrictive assumption in many real-world scenarios where domain differences are common. To overcome this limitation, Cross-domain few-shot learning (CDFSL) has gained attention, as it allows source and target data to come from different domains and label spaces. This paper presents the first comprehensive review of Cross-domain Few-shot Learning (CDFSL), a field that has received less attention compared to traditional FSL due to its unique challenges. We aim to provide both a position paper and a tutorial for researchers, covering key problems, existing methods, and future research directions. The review begins with a formal definition of CDFSL, outlining its core challenges, followed by a systematic analysis of current approaches, organized under a clear taxonomy. Finally, we discuss promising future directions in terms of problem setups, applications, and theoretical advancements.

CVAug 17, 2022
Cross-Domain Few-Shot Classification via Inter-Source Stylization

Huali Xu, Shuaifeng Zhi, Li Liu

The goal of Cross-Domain Few-Shot Classification (CDFSC) is to accurately classify a target dataset with limited labelled data by exploiting the knowledge of a richly labelled auxiliary dataset, despite the differences between the domains of the two datasets. Some existing approaches require labelled samples from multiple domains for model training. However, these methods fail when the sample labels are scarce. To overcome this challenge, this paper proposes a solution that makes use of multiple source domains without the need for additional labeling costs. Specifically, one of the source domains is completely tagged, while the others are untagged. An Inter-Source Stylization Network (ISSNet) is then introduced to enhance stylisation across multiple source domains, enriching data distribution and model's generalization capabilities. Experiments on 8 target datasets show that ISSNet leverages unlabelled data from multiple source data and significantly reduces the negative impact of domain gaps on classification performance compared to several baseline methods.

CVNov 15, 2024Code
Step-wise Distribution Alignment Guided Style Prompt Tuning for Source-free Cross-domain Few-shot Learning

Huali Xu, Li Liu, Tianpeng Liu et al.

Existing cross-domain few-shot learning (CDFSL) methods, which develop source-domain training strategies to enhance model transferability, face challenges with large-scale pre-trained models (LMs) due to inaccessible source data and training strategies. Moreover, fine-tuning LMs for CDFSL demands substantial computational resources, limiting practicality. This paper addresses the source-free CDFSL (SF-CDFSL) problem, tackling few-shot learning (FSL) in the target domain using only pre-trained models and a few target samples without source data or strategies. To overcome the challenge of inaccessible source data, this paper introduces Step-wise Distribution Alignment Guided Style Prompt Tuning (StepSPT), which implicitly narrows domain gaps through prediction distribution optimization. StepSPT proposes a style prompt to align target samples with the desired distribution and adopts a dual-phase optimization process. In the external process, a step-wise distribution alignment strategy factorizes prediction distribution optimization into a multi-step alignment problem to tune the style prompt. In the internal process, the classifier is updated using standard cross-entropy loss. Evaluations on five datasets demonstrate that StepSPT outperforms existing prompt tuning-based methods and SOTAs. Ablation studies further verify its effectiveness. Code will be made publicly available at https://github.com/xuhuali-mxj/StepSPT.

CVMar 4, 2024
Enhancing Information Maximization with Distance-Aware Contrastive Learning for Source-Free Cross-Domain Few-Shot Learning

Huali Xu, Li Liu, Shuaifeng Zhi et al.

Existing Cross-Domain Few-Shot Learning (CDFSL) methods require access to source domain data to train a model in the pre-training phase. However, due to increasing concerns about data privacy and the desire to reduce data transmission and training costs, it is necessary to develop a CDFSL solution without accessing source data. For this reason, this paper explores a Source-Free CDFSL (SF-CDFSL) problem, in which CDFSL is addressed through the use of existing pretrained models instead of training a model with source data, avoiding accessing source data. This paper proposes an Enhanced Information Maximization with Distance-Aware Contrastive Learning (IM-DCL) method to address these challenges. Firstly, we introduce the transductive mechanism for learning the query set. Secondly, information maximization (IM) is explored to map target samples into both individual certainty and global diversity predictions, helping the source model better fit the target data distribution. However, IM fails to learn the decision boundary of the target task. This motivates us to introduce a novel approach called Distance-Aware Contrastive Learning (DCL), in which we consider the entire feature set as both positive and negative sets, akin to Schrodinger's concept of a dual state. Instead of a rigid separation between positive and negative sets, we employ a weighted distance calculation among features to establish a soft classification of the positive and negative sets for the entire feature set. Furthermore, we address issues related to IM by incorporating contrastive constraints between object features and their corresponding positive and negative sets. Evaluations of the 4 datasets in the BSCD-FSL benchmark indicate that the proposed IM-DCL, without accessing the source domain, demonstrates superiority over existing methods, especially in the distant domain task.

CVJun 11, 2020
An Edge Information and Mask Shrinking Based Image Inpainting Approach

Huali Xu, Xiangdong Su, Meng Wang et al.

In the image inpainting task, the ability to repair both high-frequency and low-frequency information in the missing regions has a substantial influence on the quality of the restored image. However, existing inpainting methods usually fail to consider both high-frequency and low-frequency information simultaneously. To solve this problem, this paper proposes edge information and mask shrinking based image inpainting approach, which consists of two models. The first model is an edge generation model used to generate complete edge information from the damaged image, and the second model is an image completion model used to fix the missing regions with the generated edge information and the valid contents of the damaged image. The mask shrinking strategy is employed in the image completion model to track the areas to be repaired. The proposed approach is evaluated qualitatively and quantitatively on the dataset Places2. The result shows our approach outperforms state-of-the-art methods.

ASMay 29, 2020
SNR-Based Teachers-Student Technique for Speech Enhancement

Xiang Hao, Xiangdong Su, Zhiyu Wang et al.

It is very challenging for speech enhancement methods to achieves robust performance under both high signal-to-noise ratio (SNR) and low SNR simultaneously. In this paper, we propose a method that integrates an SNR-based teachers-student technique and time-domain U-Net to deal with this problem. Specifically, this method consists of multiple teacher models and a student model. We first train the teacher models under multiple small-range SNRs that do not coincide with each other so that they can perform speech enhancement well within the specific SNR range. Then, we choose different teacher models to supervise the training of the student model according to the SNR of the training data. Eventually, the student model can perform speech enhancement under both high SNR and low SNR. To evaluate the proposed method, we constructed a dataset with an SNR ranging from -20dB to 20dB based on the public dataset. We experimentally analyzed the effectiveness of the SNR-based teachers-student technique and compared the proposed method with several state-of-the-art methods.