LGSep 8, 2023Code
UER: A Heuristic Bias Addressing Approach for Online Continual LearningHuiwei Lin, Shanshan Feng, Baoquan Zhang et al.
Online continual learning aims to continuously train neural networks from a continuous data stream with a single pass-through data. As the most effective approach, the rehearsal-based methods replay part of previous data. Commonly used predictors in existing methods tend to generate biased dot-product logits that prefer to the classes of current data, which is known as a bias issue and a phenomenon of forgetting. Many approaches have been proposed to overcome the forgetting problem by correcting the bias; however, they still need to be improved in online fashion. In this paper, we try to address the bias issue by a more straightforward and more efficient method. By decomposing the dot-product logits into an angle factor and a norm factor, we empirically find that the bias problem mainly occurs in the angle factor, which can be used to learn novel knowledge as cosine logits. On the contrary, the norm factor abandoned by existing methods helps remember historical knowledge. Based on this observation, we intuitively propose to leverage the norm factor to balance the new and old knowledge for addressing the bias. To this end, we develop a heuristic approach called unbias experience replay (UER). UER learns current samples only by the angle factor and further replays previous samples by both the norm and angle factors. Extensive experiments on three datasets show that UER achieves superior performance over various state-of-the-art methods. The code is in https://github.com/FelixHuiweiLin/UER.
CVApr 10, 2023
PCR: Proxy-based Contrastive Replay for Online Class-Incremental Continual LearningHuiwei Lin, Baoquan Zhang, Shanshan Feng et al.
Online class-incremental continual learning is a specific task of continual learning. It aims to continuously learn new classes from data stream and the samples of data stream are seen only once, which suffers from the catastrophic forgetting issue, i.e., forgetting historical knowledge of old classes. Existing replay-based methods effectively alleviate this issue by saving and replaying part of old data in a proxy-based or contrastive-based replay manner. Although these two replay manners are effective, the former would incline to new classes due to class imbalance issues, and the latter is unstable and hard to converge because of the limited number of samples. In this paper, we conduct a comprehensive analysis of these two replay manners and find that they can be complementary. Inspired by this finding, we propose a novel replay-based method called proxy-based contrastive replay (PCR). The key operation is to replace the contrastive samples of anchors with corresponding proxies in the contrastive-based way. It alleviates the phenomenon of catastrophic forgetting by effectively addressing the imbalance issue, as well as keeps a faster convergence of the model. We conduct extensive experiments on three real-world benchmark datasets, and empirical results consistently demonstrate the superiority of PCR over various state-of-the-art methods.
LGJul 31, 2023
MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot LearningBaoquan Zhang, Chuyao Luo, Demin Yu et al.
Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.
LGSep 26, 2023
HPCR: Holistic Proxy-based Contrastive Replay for Online Continual LearningHuiwei Lin, Shanshan Feng, Baoquan Zhang et al.
Online continual learning, aimed at developing a neural network that continuously learns new data from a single pass over an online data stream, generally suffers from catastrophic forgetting. Existing replay-based methods alleviate forgetting by replaying partial old data in a proxy-based or contrastive-based replay manner, each with its own shortcomings. Our previous work proposes a novel replay-based method called proxy-based contrastive replay (PCR), which handles the shortcomings by achieving complementary advantages of both replay manners. In this work, we further conduct gradient and limitation analysis of PCR. The analysis results show that PCR still can be further improved in feature extraction, generalization, and anti-forgetting capabilities of the model. Hence, we develop a more advanced method named holistic proxy-based contrastive replay (HPCR). HPCR consists of three components, each tackling one of the limitations of PCR. The contrastive component conditionally incorporates anchor-to-sample pairs to PCR, improving the feature extraction ability. The second is a temperature component that decouples the temperature coefficient into two parts based on their gradient impacts and sets different values for them to enhance the generalization ability. The third is a distillation component that constrains the learning process with additional loss terms to improve the anti-forgetting ability. Experiments on four datasets consistently demonstrate the superiority of HPCR over various state-of-the-art methods.
LGJul 17, 2024
ER-FSL: Experience Replay with Feature Subspace Learning for Online Continual LearningHuiwei Lin
Online continual learning (OCL) involves deep neural networks retaining knowledge from old data while adapting to new data, which is accessible only once. A critical challenge in OCL is catastrophic forgetting, reflected in reduced model performance on old data. Existing replay-based methods mitigate forgetting by replaying buffered samples from old data and learning current samples of new data. In this work, we dissect existing methods and empirically discover that learning and replaying in the same feature space is not conducive to addressing the forgetting issue. Since the learned features associated with old data are readily changed by the features related to new data due to data imbalance, leading to the forgetting problem. Based on this observation, we intuitively explore learning and replaying in different feature spaces. Learning in a feature subspace is sufficient to capture novel knowledge from new data while replaying in a larger feature space provides more feature space to maintain historical knowledge from old data. To this end, we propose a novel OCL approach called experience replay with feature subspace learning (ER-FSL). Firstly, ER-FSL divides the entire feature space into multiple subspaces, with each subspace used to learn current samples. Moreover, it introduces a subspace reuse mechanism to address situations where no blank subspaces exist. Secondly, ER-FSL replays previous samples using an accumulated space comprising all learned subspaces. Extensive experiments on three datasets demonstrate the superiority of ER-FSL over various state-of-the-art methods.