Xutao Li

h-index35

26papers

852citations

Novelty52%

AI Score52

Ranked #36,377 of 201,326 authors (top 18%)#14,369 in CV (top 24%)

26 Papers

LGSep 8, 2023Code

UER: A Heuristic Bias Addressing Approach for Online Continual Learning

Huiwei Lin, Shanshan Feng, Baoquan Zhang et al.

Online continual learning aims to continuously train neural networks from a continuous data stream with a single pass-through data. As the most effective approach, the rehearsal-based methods replay part of previous data. Commonly used predictors in existing methods tend to generate biased dot-product logits that prefer to the classes of current data, which is known as a bias issue and a phenomenon of forgetting. Many approaches have been proposed to overcome the forgetting problem by correcting the bias; however, they still need to be improved in online fashion. In this paper, we try to address the bias issue by a more straightforward and more efficient method. By decomposing the dot-product logits into an angle factor and a norm factor, we empirically find that the bias problem mainly occurs in the angle factor, which can be used to learn novel knowledge as cosine logits. On the contrary, the norm factor abandoned by existing methods helps remember historical knowledge. Based on this observation, we intuitively propose to leverage the norm factor to balance the new and old knowledge for addressing the bias. To this end, we develop a heuristic approach called unbias experience replay (UER). UER learns current samples only by the angle factor and further replays previous samples by both the norm and angle factors. Extensive experiments on three datasets show that UER achieves superior performance over various state-of-the-art methods. The code is in https://github.com/FelixHuiweiLin/UER.

CVApr 10, 2023

PCR: Proxy-based Contrastive Replay for Online Class-Incremental Continual Learning

Huiwei Lin, Baoquan Zhang, Shanshan Feng et al.

Online class-incremental continual learning is a specific task of continual learning. It aims to continuously learn new classes from data stream and the samples of data stream are seen only once, which suffers from the catastrophic forgetting issue, i.e., forgetting historical knowledge of old classes. Existing replay-based methods effectively alleviate this issue by saving and replaying part of old data in a proxy-based or contrastive-based replay manner. Although these two replay manners are effective, the former would incline to new classes due to class imbalance issues, and the latter is unstable and hard to converge because of the limited number of samples. In this paper, we conduct a comprehensive analysis of these two replay manners and find that they can be complementary. Inspired by this finding, we propose a novel replay-based method called proxy-based contrastive replay (PCR). The key operation is to replace the contrastive samples of anchors with corresponding proxies in the contrastive-based way. It alleviates the phenomenon of catastrophic forgetting by effectively addressing the imbalance issue, as well as keeps a faster convergence of the model. We conduct extensive experiments on three real-world benchmark datasets, and empirical results consistently demonstrate the superiority of PCR over various state-of-the-art methods.

LGMar 3, 2022

MetaDT: Meta Decision Tree with Class Hierarchy for Interpretable Few-Shot Learning

Baoquan Zhang, Hao Jiang, Xutao Li et al.

Few-Shot Learning (FSL) is a challenging task, which aims to recognize novel classes with few examples. Recently, lots of methods have been proposed from the perspective of meta-learning and representation learning. However, few works focus on the interpretability of FSL decision process. In this paper, we take a step towards the interpretable FSL by proposing a novel meta-learning based decision tree framework, namely, MetaDT. In particular, the FSL interpretability is achieved from two aspects, i.e., a concept aspect and a visual aspect. On the concept aspect, we first introduce a tree-like concept hierarchy as FSL prior. Then, resorting to the prior, we split each few-shot task to a set of subtasks with different concept levels and then perform class prediction via a model of decision tree. The advantage of such design is that a sequence of high-level concept decisions that lead up to a final class prediction can be obtained, which clarifies the FSL decision process. On the visual aspect, a set of subtask-specific classifiers with visual attention mechanism is designed to perform decision at each node of the decision tree. As a result, a subtask-specific heatmap visualization can be obtained to achieve the decision interpretability of each tree node. At last, to alleviate the data scarcity issue of FSL, we regard the prior of concept hierarchy as an undirected graph, and then design a graph convolution-based decision tree inference network as our meta-learner to infer parameters of the decision tree. Extensive experiments on performance comparison and interpretability analysis show superiority of our MetaDT.

LGJul 31, 2023

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

Baoquan Zhang, Chuyao Luo, Demin Yu et al.

Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.

CVMar 28

SJD-VP: Speculative Jacobi Decoding with Verification Prediction for Autoregressive Image Generation

Bingqi Shan, Baoquan Zhang, Xiaochen Qi et al.

Speculative Jacobi Decoding (SJD) has emerged as a promising method for accelerating autoregressive image generation. Despite its potential, existing SJD approaches often suffer from the low acceptance rate issue of speculative tokens due to token selection ambiguity. Recent works attempt to mitigate this issue primarily from the relaxed token verification perspective but fail to fully exploit the iterative dynamics of decoding. In this paper, we conduct an in-depth analysis and make a novel observation that tokens whose probabilities increase are more likely to match the verification-accepted and correct token. Based on this, we propose a novel Speculative Jacobi Decoding with Verification Prediction (SJD-VP). The key idea is to leverage the change in token probabilities across iterations to guide sampling, favoring tokens whose probabilities increase. This effectively predicts which tokens are likely to pass subsequent verification, boosting the acceptance rate. In particular, our SJD-VP is plug-and-play and can be seamlessly integrated into existing SJD methods. Extensive experiments on standard benchmarks demonstrate that our SJD-VP method consistently accelerates autoregressive decoding while improving image generation quality.

LGSep 26, 2023

HPCR: Holistic Proxy-based Contrastive Replay for Online Continual Learning

Huiwei Lin, Shanshan Feng, Baoquan Zhang et al.

Online continual learning, aimed at developing a neural network that continuously learns new data from a single pass over an online data stream, generally suffers from catastrophic forgetting. Existing replay-based methods alleviate forgetting by replaying partial old data in a proxy-based or contrastive-based replay manner, each with its own shortcomings. Our previous work proposes a novel replay-based method called proxy-based contrastive replay (PCR), which handles the shortcomings by achieving complementary advantages of both replay manners. In this work, we further conduct gradient and limitation analysis of PCR. The analysis results show that PCR still can be further improved in feature extraction, generalization, and anti-forgetting capabilities of the model. Hence, we develop a more advanced method named holistic proxy-based contrastive replay (HPCR). HPCR consists of three components, each tackling one of the limitations of PCR. The contrastive component conditionally incorporates anchor-to-sample pairs to PCR, improving the feature extraction ability. The second is a temperature component that decouples the temperature coefficient into two parts based on their gradient impacts and sets different values for them to enhance the generalization ability. The third is a distillation component that constrains the learning process with additional loss terms to improve the anti-forgetting ability. Experiments on four datasets consistently demonstrate the superiority of HPCR over various state-of-the-art methods.

CVDec 11, 2023Code

DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting

Demin Yu, Xutao Li, Yunming Ye et al.

Precipitation nowcasting is an important spatio-temporal prediction task to predict the radar echoes sequences based on current observations, which can serve both meteorological science and smart city applications. Due to the chaotic evolution nature of the precipitation systems, it is a very challenging problem. Previous studies address the problem either from the perspectives of deterministic modeling or probabilistic modeling. However, their predictions suffer from the blurry, high-value echoes fading away and position inaccurate issues. The root reason of these issues is that the chaotic evolutionary precipitation systems are not appropriately modeled. Inspired by the nature of the systems, we propose to decompose and model them from the perspective of global deterministic motion and local stochastic variations with residual mechanism. A unified and flexible framework that can equip any type of spatio-temporal models is proposed based on residual diffusion, which effectively tackles the shortcomings of previous methods. Extensive experimental results on four publicly available radar datasets demonstrate the effectiveness and superiority of the proposed framework, compared to state-of-the-art techniques. Our code is publicly available at https://github.com/DeminYu98/DiffCast.

CVSep 29, 2025Code

BFSM: 3D Bidirectional Face-Skull Morphable Model

Zidu Wang, Meng Xu, Miao Xu et al.

Building a joint face-skull morphable model holds great potential for applications such as remote diagnostics, surgical planning, medical education, and physically based facial simulation. However, realizing this vision is constrained by the scarcity of paired face-skull data, insufficient registration accuracy, and limited exploration of reconstruction and clinical applications. Moreover, individuals with craniofacial deformities are often overlooked, resulting in underrepresentation and limited inclusivity. To address these challenges, we first construct a dataset comprising over 200 samples, including both normal cases and rare craniofacial conditions. Each case contains a CT-based skull, a CT-based face, and a high-fidelity textured face scan. Secondly, we propose a novel dense ray matching registration method that ensures topological consistency across face, skull, and their tissue correspondences. Based on this, we introduce the 3D Bidirectional Face-Skull Morphable Model (BFSM), which enables shape inference between the face and skull through a shared coefficient space, while also modeling tissue thickness variation to support one-to-many facial reconstructions from the same skull, reflecting individual changes such as fat over time. Finally, we demonstrate the potential of BFSM in medical applications, including 3D face-skull reconstruction from a single image and surgical planning prediction. Extensive experiments confirm the robustness and accuracy of our method. BFSM is available at https://github.com/wang-zidu/BFSM

CVSep 10, 2020Code

Prototype Completion with Primitive Knowledge for Few-Shot Learning

Baoquan Zhang, Xutao Li, Yunming Ye et al.

Few-shot learning is a challenging task, which aims to learn a classifier for novel classes with few examples. Pre-training based meta-learning methods effectively tackle the problem by pre-training a feature extractor and then fine-tuning it through the nearest centroid based meta-learning. However, results show that the fine-tuning step makes very marginal improvements. In this paper, 1) we figure out the key reason, i.e., in the pre-trained feature space, the base classes already form compact clusters while novel classes spread as groups with large variances, which implies that fine-tuning the feature extractor is less meaningful; 2) instead of fine-tuning the feature extractor, we focus on estimating more representative prototypes during meta-learning. Consequently, we propose a novel prototype completion based meta-learning framework. This framework first introduces primitive knowledge (i.e., class-level part or attribute annotations) and extracts representative attribute features as priors. Then, we design a prototype completion network to learn to complete prototypes with these priors. To avoid the prototype completion error caused by primitive knowledge noises or class differences, we further develop a Gaussian based prototype fusion strategy that combines the mean-based and completed prototypes by exploiting the unlabeled samples. Extensive experiments show that our method: (i) can obtain more accurate prototypes; (ii) outperforms state-of-the-art techniques by 2% - 9% in terms of classification accuracy. Our code is available online.

CVMar 15, 2024

Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling

Baoquan Zhang, Huaibin Wang, Luo Chuyao et al.

Vector-Quantized Image Modeling (VQIM) is a fundamental research problem in image synthesis, which aims to represent an image with a discrete token sequence. Existing studies effectively address this problem by learning a discrete codebook from scratch and in a code-independent manner to quantize continuous representations into discrete tokens. However, learning a codebook from scratch and in a code-independent manner is highly challenging, which may be a key reason causing codebook collapse, i.e., some code vectors can rarely be optimized without regard to the relationship between codes and good codebook priors such that die off finally. In this paper, inspired by pretrained language models, we find that these language models have actually pretrained a superior codebook via a large number of text corpus, but such information is rarely exploited in VQIM. To this end, we propose a novel codebook transfer framework with part-of-speech, called VQCT, which aims to transfer a well-trained codebook from pretrained language models to VQIM for robust codebook learning. Specifically, we first introduce a pretrained codebook from language models and part-of-speech knowledge as priors. Then, we construct a vision-related codebook with these priors for achieving codebook transfer. Finally, a novel codebook transfer network is designed to exploit abundant semantic relationships between codes contained in pretrained codebooks for robust VQIM codebook learning. Experimental results on four datasets show that our VQCT method achieves superior VQIM performance over previous state-of-the-art methods.

CVMay 23, 2024

LG-VQ: Language-Guided Codebook Learning

Guotao Liang, Baoquan Zhang, Yaowei Wang et al.

Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e.g.}, image), resulting in suboptimal performance when the codebook is applied to multi-modal downstream tasks (\emph{e.g.}, text-to-image, image captioning) due to the existence of modal gaps. In this paper, we propose a novel language-guided codebook learning framework, called LG-VQ, which aims to learn a codebook that can be aligned with the text to improve the performance of multi-modal downstream tasks. Specifically, we first introduce pre-trained text semantics as prior knowledge, then design two novel alignment modules (\emph{i.e.}, Semantic Alignment Module, and Relationship Alignment Module) to transfer such prior knowledge into codes for achieving codebook text alignment. In particular, our LG-VQ method is model-agnostic, which can be easily integrated into existing VQ models. Experimental results show that our method achieves superior performance on reconstruction and various multi-modal downstream tasks.

LGApr 9, 2025

From Text to Time? Rethinking the Effectiveness of the Large Language Model for Time Series Forecasting

Xinyu Zhang, Shanshan Feng, Xutao Li

Using pre-trained large language models (LLMs) as the backbone for time series prediction has recently gained significant research interest. However, the effectiveness of LLM backbones in this domain remains a topic of debate. Based on thorough empirical analyses, we observe that training and testing LLM-based models on small datasets often leads to the Encoder and Decoder becoming overly adapted to the dataset, thereby obscuring the true predictive capabilities of the LLM backbone. To investigate the genuine potential of LLMs in time series prediction, we introduce three pre-training models with identical architectures but different pre-training strategies. Thereby, large-scale pre-training allows us to create unbiased Encoder and Decoder components tailored to the LLM backbone. Through controlled experiments, we evaluate the zero-shot and few-shot prediction performance of the LLM, offering insights into its capabilities. Extensive experiments reveal that although the LLM backbone demonstrates some promise, its forecasting performance is limited. Our source code is publicly available in the anonymous repository: https://anonymous.4open.science/r/LLM4TS-0B5C.

LGApr 16, 2024

Four-hour thunderstorm nowcasting using deep diffusion models of satellite

Kuai Dai, Xutao Li, Junying Fang et al.

Convection (thunderstorm) develops rapidly within hours and is highly destructive, posing a significant challenge for nowcasting and resulting in substantial losses to nature and society. After the emergence of artificial intelligence (AI)-based methods, convection nowcasting has experienced rapid advancements, with its performance surpassing that of physics-based numerical weather prediction and other conventional approaches. However, the lead time and coverage of it still leave much to be desired and hardly meet the needs of disaster emergency response. Here, we propose deep diffusion models of satellite (DDMS) to establish an AI-based convection nowcasting system. Specifically, DDMS employs diffusion processes to effectively simulate complicated spatiotemporal evolution patterns of convective clouds, significantly improving the forecast lead time. Additionally, it combines geostationary satellite brightness temperature data and domain knowledge from meteorological experts, thereby achieving planetary-scale forecast coverage. During long-term tests and objective validation based on the FengYun-4A satellite, our system achieves, for the first time, effective convection nowcasting up to 4 hours, with broad coverage (about 20,000,000 km$^2$), remarkable accuracy, and high resolution (15 minutes; 4 km). Its performance reaches a new height in convection nowcasting compared to the existing models. In terms of application, our system is highly transferable with the potential to collaborate with multiple satellites for global convection nowcasting. Furthermore, our results highlight the remarkable capabilities of diffusion models in convective clouds forecasting, as well as the significant value of geostationary satellite data when empowered by AI technologies.

CVJul 23, 2025

PointLAMA: Latent Attention meets Mamba for Efficient Point Cloud Pretraining

Xuanyu Lin, Xiaona Zeng, Xianwei Zheng et al.

Mamba has recently gained widespread attention as a backbone model for point cloud modeling, leveraging a state-space architecture that enables efficient global sequence modeling with linear complexity. However, its lack of local inductive bias limits its capacity to capture fine-grained geometric structures in 3D data. To address this limitation, we propose \textbf{PointLAMA}, a point cloud pretraining framework that combines task-aware point cloud serialization, a hybrid encoder with integrated Latent Attention and Mamba blocks, and a conditional diffusion mechanism built upon the Mamba backbone. Specifically, the task-aware point cloud serialization employs Hilbert/Trans-Hilbert space-filling curves and axis-wise sorting to structurally align point tokens for classification and segmentation tasks, respectively. Our lightweight Latent Attention block features a Point-wise Multi-head Latent Attention (PMLA) module, which is specifically designed to align with the Mamba architecture by leveraging the shared latent space characteristics of PMLA and Mamba. This enables enhanced local context modeling while preserving overall efficiency. To further enhance representation learning, we incorporate a conditional diffusion mechanism during pretraining, which denoises perturbed feature sequences without relying on explicit point-wise reconstruction. Experimental results demonstrate that PointLAMA achieves competitive performance on multiple benchmark datasets with minimal parameter count and FLOPs, validating its effectiveness for efficient point cloud pretraining.

CVNov 19, 2024

Prototype Optimization with Neural ODE for Few-Shot Learning

Baoquan Zhang, Shanshan Feng, Bingqi Shan et al.

Few-Shot Learning (FSL) is a challenging task, which aims to recognize novel classes with few examples. Pre-training based methods effectively tackle the problem by pre-training a feature extractor and then performing class prediction via a cosine classifier with mean-based prototypes. Nevertheless, due to the data scarcity, the mean-based prototypes are usually biased. In this paper, we attempt to diminish the prototype bias by regarding it as a prototype optimization problem. To this end, we propose a novel prototype optimization framework to rectify prototypes, i.e., introducing a meta-optimizer to optimize prototypes. Although the existing meta-optimizers can also be adapted to our framework, they all overlook a crucial gradient bias issue, i.e., the mean-based gradient estimation is also biased on sparse data. To address this issue, in this paper, we regard the gradient and its flow as meta-knowledge and then propose a novel Neural Ordinary Differential Equation (ODE)-based meta-optimizer to optimize prototypes, called MetaNODE. Although MetaNODE has shown superior performance, it suffers from a huge computational burden. To further improve its computation efficiency, we conduct a detailed analysis on MetaNODE and then design an effective and efficient MetaNODE extension version (called E2MetaNODE). It consists of two novel modules: E2GradNet and E2Solver, which aim to estimate accurate gradient flows and solve optimal prototypes in an effective and efficient manner, respectively. Extensive experiments show that 1) our methods achieve superior performance over previous FSL methods and 2) our E2MetaNODE significantly improves computation efficiency meanwhile without performance degradation.

CVApr 26, 2024

MCSDNet: Mesoscale Convective System Detection Network via Multi-scale Spatiotemporal Information

Jiajun Liang, Baoquan Zhang, Yunming Ye et al.

The accurate detection of Mesoscale Convective Systems (MCS) is crucial for meteorological monitoring due to their potential to cause significant destruction through severe weather phenomena such as hail, thunderstorms, and heavy rainfall. However, the existing methods for MCS detection mostly targets on single-frame detection, which just considers the static characteristics and ignores the temporal evolution in the life cycle of MCS. In this paper, we propose a novel encoder-decoder neural network for MCS detection(MCSDNet). MCSDNet has a simple architecture and is easy to expand. Different from the previous models, MCSDNet targets on multi-frames detection and leverages multi-scale spatiotemporal information for the detection of MCS regions in remote sensing imagery(RSI). As far as we know, it is the first work to utilize multi-scale spatiotemporal information to detect MCS regions. Firstly, we design a multi-scale spatiotemporal information module to extract multi-level semantic from different encoder levels, which makes our models can extract more detail spatiotemporal features. Secondly, a Spatiotemporal Mix Unit(STMU) is introduced to MCSDNet to capture both intra-frame features and inter-frame correlations, which is a scalable module and can be replaced by other spatiotemporal module, e.g., CNN, RNN, Transformer and our proposed Dual Spatiotemporal Attention(DSTA). This means that the future works about spatiotemporal modules can be easily integrated to our model. Finally, we present MCSRSI, the first publicly available dataset for multi-frames MCS detection based on visible channel images from the FY-4A satellite. We also conduct several experiments on MCSRSI and find that our proposed MCSDNet achieve the best performance on MCS detection task when comparing to other baseline methods.

CVOct 9, 2021

SGMNet: Scene Graph Matching Network for Few-Shot Remote Sensing Scene Classification

Baoquan Zhang, Shanshan Feng, Xutao Li et al.

Few-Shot Remote Sensing Scene Classification (FSRSSC) is an important task, which aims to recognize novel scene classes with few examples. Recently, several studies attempt to address the FSRSSC problem by following few-shot natural image classification methods. These existing methods have made promising progress and achieved superior performance. However, they all overlook two unique characteristics of remote sensing images: (i) object co-occurrence that multiple objects tend to appear together in a scene image and (ii) object spatial correlation that these co-occurrence objects are distributed in the scene image following some spatial structure patterns. Such unique characteristics are very beneficial for FSRSSC, which can effectively alleviate the scarcity issue of labeled remote sensing images since they can provide more refined descriptions for each scene class. To fully exploit these characteristics, we propose a novel scene graph matching-based meta-learning framework for FSRSSC, called SGMNet. In this framework, a scene graph construction module is carefully designed to represent each test remote sensing image or each scene class as a scene graph, where the nodes reflect these co-occurrence objects meanwhile the edges capture the spatial correlations between these co-occurrence objects. Then, a scene graph matching module is further developed to evaluate the similarity score between each test remote sensing image and each scene class. Finally, based on the similarity scores, we perform the scene class prediction via a nearest neighbor classifier. We conduct extensive experiments on UCMerced LandUse, WHU19, AID, and NWPU-RESISC45 datasets. The experimental results show that our method obtains superior performance over the previous state-of-the-art methods.

LGOct 3, 2021

RAP-Net: Region Attention Predictive Network for Precipitation Nowcasting

Chuyao Luo, ZhengZhang, Rui Ye et al.

Natural disasters caused by heavy rainfall often cost huge loss of life and property. To avoid it, the task of precipitation nowcasting is imminent. To solve the problem, increasingly deep learning methods are proposed to forecast future radar echo images and then the predicted maps have converted the distribution of rainfall. The prevailing spatiotemporal sequence prediction methods apply ConvRNN structure which combines the Convolution and Recurrent neural network. Although improvements based on ConvRNN achieve remarkable success, these methods ignore capturing both local and global spatial features simultaneously, which degrades the nowcasting in the region of heavy rainfall. To address this issue, we proposed the Region Attention Block (RAB) and embed it into ConvRNN to enhance the forecast in the area with strong rainfall. Besides, the ConvRNN models are hard to memory longer history representations with limited parameters. Considering it, we propose Recall Attention Mechanism (RAM) to improve the prediction. By preserving longer temporal information, RAM contributes to the forecasting, especially in the middle rainfall intensity. The experiments show that the proposed model Region Attention Predictive Network (RAP-Net) has outperformed the state-of-art method.

CVAug 11, 2021

Prototype Completion for Few-Shot Learning

Baoquan Zhang, Xutao Li, Yunming Ye et al.

Few-shot learning aims to recognize novel classes with few examples. Pre-training based methods effectively tackle the problem by pre-training a feature extractor and then fine-tuning it through the nearest centroid based meta-learning. However, results show that the fine-tuning step makes marginal improvements. In this paper, 1) we figure out the reason, i.e., in the pre-trained feature space, the base classes already form compact clusters while novel classes spread as groups with large variances, which implies that fine-tuning feature extractor is less meaningful; 2) instead of fine-tuning feature extractor, we focus on estimating more representative prototypes. Consequently, we propose a novel prototype completion based meta-learning framework. This framework first introduces primitive knowledge (i.e., class-level part or attribute annotations) and extracts representative features for seen attributes as priors. Second, a part/attribute transfer network is designed to learn to infer the representative features for unseen attributes as supplementary priors. Finally, a prototype completion network is devised to learn to complete prototypes with these priors. Moreover, to avoid the prototype completion error, we further develop a Gaussian based prototype fusion strategy that fuses the mean-based and completed prototypes by exploiting the unlabeled samples. Extensive experiments show that our method: (i) obtains more accurate prototypes; (ii) achieves superior performance on both inductive and transductive FSL settings.

CVMar 26, 2021

MetaNODE: Prototype Optimization as a Neural ODE for Few-Shot Learning

Baoquan Zhang, Xutao Li, Shanshan Feng et al.

Few-Shot Learning (FSL) is a challenging task, \emph{i.e.}, how to recognize novel classes with few examples? Pre-training based methods effectively tackle the problem by pre-training a feature extractor and then predicting novel classes via a cosine nearest neighbor classifier with mean-based prototypes. Nevertheless, due to the data scarcity, the mean-based prototypes are usually biased. In this paper, we attempt to diminish the prototype bias by regarding it as a prototype optimization problem. To this end, we propose a novel meta-learning based prototype optimization framework to rectify prototypes, \emph{i.e.}, introducing a meta-optimizer to optimize prototypes. Although the existing meta-optimizers can also be adapted to our framework, they all overlook a crucial gradient bias issue, \emph{i.e.}, the mean-based gradient estimation is also biased on sparse data. To address the issue, we regard the gradient and its flow as meta-knowledge and then propose a novel Neural Ordinary Differential Equation (ODE)-based meta-optimizer to polish prototypes, called MetaNODE. In this meta-optimizer, we first view the mean-based prototypes as initial prototypes, and then model the process of prototype optimization as continuous-time dynamics specified by a Neural ODE. A gradient flow inference network is carefully designed to learn to estimate the continuous gradient flow for prototype dynamics. Finally, the optimal prototypes can be obtained by solving the Neural ODE. Extensive experiments on miniImagenet, tieredImagenet, and CUB-200-2011 show the effectiveness of our method.

LGAug 6, 2020

Multi-source Heterogeneous Domain Adaptation with Conditional Weighting Adversarial Network

Yuan Yao, Xutao Li, Yu Zhang et al.

Heterogeneous domain adaptation (HDA) tackles the learning of cross-domain samples with both different probability distributions and feature representations. Most of the existing HDA studies focus on the single-source scenario. In reality, however, it is not uncommon to obtain samples from multiple heterogeneous domains. In this article, we study the multisource HDA problem and propose a conditional weighting adversarial network (CWAN) to address it. The proposed CWAN adversarially learns a feature transformer, a label classifier, and a domain discriminator. To quantify the importance of different source domains, CWAN introduces a sophisticated conditional weighting scheme to calculate the weights of the source domains according to the conditional distribution divergence between the source and target domains. Different from existing weighting schemes, the proposed conditional weighting scheme not only weights the source domains but also implicitly aligns the conditional distributions during the optimization process. Experimental results clearly demonstrate that the proposed CWAN performs much better than several state-of-the-art methods on four real-world datasets.

CVJul 5, 2020

MetaConcept: Learn to Abstract via Concept Graph for Weakly-Supervised Few-Shot Learning

Baoquan Zhang, Ka-Cheong Leung, Yunming Ye et al.

Meta-learning has been proved to be an effective framework to address few-shot learning problems. The key challenge is how to minimize the generalization error of base learner across tasks. In this paper, we explore the concept hierarchy knowledge by leveraging concept graph, and take the concept graph as explicit meta-knowledge for the base learner, instead of learning implicit meta-knowledge, so as to boost the classification performance of meta-learning on weakly-supervised few-shot learning problems. To this end, we propose a novel meta-learning framework, called MetaConcept, which learns to abstract concepts via the concept graph. Specifically, we firstly propose a novel regularization with multi-level conceptual abstraction to constrain a meta-learner to learn to abstract concepts via the concept graph (i.e. identifying the concepts from low to high levels). Then, we propose a meta concept inference network as the meta-learner for the base learner, aiming to quickly adapt to a novel task by the joint inference of the abstract concepts and a few annotated samples. We have conducted extensive experiments on two weakly-supervised few-shot learning benchmarks, namely, WS-ImageNet-Pure and WS-ImageNet-Mix. Our experimental results show that 1) the proposed MetaConcept outperforms state-of-the-art methods with an improvement of 2% to 6% in classification accuracy; 2) the proposed MetaConcept can be able to yield a good performance though merely training with weakly-labeled data sets.

IRApr 25, 2020

Inter-sequence Enhanced Framework for Personalized Sequential Recommendation

Feng Liu, Weiwen Liu, Xutao Li et al.

Modeling the sequential correlation of users' historical interactions is essential in sequential recommendation. However, the majority of the approaches mainly focus on modeling the \emph{intra-sequence} item correlation within each individual sequence but neglect the \emph{inter-sequence} item correlation across different user interaction sequences. Though several studies have been aware of this issue, their method is either simple or implicit. To make better use of such information, we propose an inter-sequence enhanced framework for the Sequential Recommendation (ISSR). In ISSR, both inter-sequence and intra-sequence item correlation are considered. Firstly, we equip graph neural networks in the inter-sequence correlation encoder to capture the high-order item correlation from the user-item bipartite graph and the item-item graph. Then, based on the inter-sequence correlation encoder, we build GRU network and attention network in the intra-sequence correlation encoder to model the item sequential correlation within each individual sequence and temporal dynamics for predicting users' preferences over candidate items. Additionally, we conduct extensive experiments on three real-world datasets. The experimental results demonstrate the superiority of ISSR over many state-of-the-art methods and the effectiveness of the inter-sequence correlation encoder.

LGAug 28, 2019

Heterogeneous Domain Adaptation via Soft Transfer Network

Yuan Yao, Yu Zhang, Xutao Li et al.

Heterogeneous domain adaptation (HDA) aims to facilitate the learning task in a target domain by borrowing knowledge from a heterogeneous source domain. In this paper, we propose a Soft Transfer Network (STN), which jointly learns a domain-shared classifier and a domain-invariant subspace in an end-to-end manner, for addressing the HDA problem. The proposed STN not only aligns the discriminative directions of domains but also matches both the marginal and conditional distributions across domains. To circumvent negative transfer, STN aligns the conditional distributions by using the soft-label strategy of unlabeled target data, which prevents the hard assignment of each unlabeled target data to only one category that may be incorrect. Further, STN introduces an adaptive coefficient to gradually increase the importance of the soft-labels since they will become more and more accurate as the number of iterations increases. We perform experiments on the transfer tasks of image-to-image, text-to-image, and text-to-text. Experimental results testify that the STN significantly outperforms several state-of-the-art approaches.

IROct 29, 2018

Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

Feng Liu, Ruiming Tang, Xutao Li et al.

Recommendation is crucial in both academia and industry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic regression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies suffer from two limitations: (1) considering the recommendation as a static procedure and ignoring the dynamic interactive nature between users and the recommender systems, (2) focusing on the immediate feedback of recommended items and neglecting the long-term rewards. To address the two limitations, in this paper we propose a novel recommendation framework based on deep reinforcement learning, called DRR. The DRR framework treats recommendation as a sequential decision making procedure and adopts an "Actor-Critic" reinforcement learning scheme to model the interactions between the users and recommender systems, which can consider both the dynamic adaptation and long-term rewards. Furthermore, a state representation module is incorporated into DRR, which can explicitly capture the interactions between items and users. Three instantiation structures are developed. Extensive experiments on four real-world datasets are conducted under both the offline and online evaluation settings. The experimental results demonstrate the proposed DRR method indeed outperforms the state-of-the-art competitors.

IRFeb 23, 2018

Novel Approaches to Accelerating the Convergence Rate of Markov Decision Process for Search Result Diversification

Feng Liu, Ruiming Tang, Xutao Li et al.

Recently, some studies have utilized the Markov Decision Process for diversifying (MDP-DIV) the search results in information retrieval. Though promising performances can be delivered, MDP-DIV suffers from a very slow convergence, which hinders its usability in real applications. In this paper, we aim to promote the performance of MDP-DIV by speeding up the convergence rate without much accuracy sacrifice. The slow convergence is incurred by two main reasons: the large action space and data scarcity. On the one hand, the sequential decision making at each position needs to evaluate the query-document relevance for all the candidate set, which results in a huge searching space for MDP; on the other hand, due to the data scarcity, the agent has to proceed more "trial and error" interactions with the environment. To tackle this problem, we propose MDP-DIV-kNN and MDP-DIV-NTN methods. The MDP-DIV-kNN method adopts a $k$ nearest neighbor strategy, i.e., discarding the $k$ nearest neighbors of the recently-selected action (document), to reduce the diversification searching space. The MDP-DIV-NTN employs a pre-trained diversification neural tensor network (NTN-DIV) as the evaluation model, and combines the results with MDP to produce the final ranking solution. The experiment results demonstrate that the two proposed methods indeed accelerate the convergence rate of the MDP-DIV, which is 3x faster, while the accuracies produced barely degrade, or even are better.