LGJul 4, 2022Code
Task-oriented Self-supervised Learning for Anomaly Detection in ElectroencephalographyYaojia Zheng, Zhouwu Liu, Rong Mo et al.
Accurate automated analysis of electroencephalography (EEG) would largely help clinicians effectively monitor and diagnose patients with various brain diseases. Compared to supervised learning with labelled disease EEG data which can train a model to analyze specific diseases but would fail to monitor previously unseen statuses, anomaly detection based on only normal EEGs can detect any potential anomaly in new EEGs. Different from existing anomaly detection strategies which do not consider any property of unavailable abnormal data during model development, a task-oriented self-supervised learning approach is proposed here which makes use of available normal EEGs and expert knowledge about abnormal EEGs to train a more effective feature extractor for the subsequent development of anomaly detector. In addition, a specific two branch convolutional neural network with larger kernels is designed as the feature extractor such that it can more easily extract both larger scale and small-scale features which often appear in unavailable abnormal EEGs. The effectively designed and trained feature extractor has shown to be able to extract better feature representations from EEGs for development of anomaly detector based on normal data and future anomaly detection for new EEGs, as demonstrated on three EEG datasets. The code is available at https://github.com/ironing/EEG-AD.
LGApr 28, 2022
Continual Learning with Bayesian Model based on a Fixed Pre-trained Feature ExtractorYang Yang, Zhiying Cui, Junjie Xu et al.
Deep learning has shown its human-level performance in various applications. However, current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes. This poses a challenge particularly in intelligent diagnosis systems where initially only training data of a limited number of diseases are available. In this case, updating the intelligent system with data of new diseases would inevitably downgrade its performance on previously learned diseases. Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning built on a fixed pre-trained feature extractor. In this model, knowledge of each old class can be compactly represented by a collection of statistical distributions, e.g. with Gaussian mixture models, and naturally kept from forgetting in continual learning over time. Unlike existing class-incremental learning methods, the proposed approach is not sensitive to the continual learning process and can be additionally well applied to the data-incremental learning scenario. Experiments on multiple medical and natural image classification tasks showed that the proposed approach outperforms state-of-the-art approaches which even keep some images of old classes during continual learning of new classes.
CVApr 18, 2023Code
Adapter Learning in Pretrained Feature Extractor for Continual Learning of DiseasesWentao Zhang, Yujun Huang, Tong Zhang et al.
Currently intelligent diagnosis systems lack the ability of continually learning to diagnose new diseases once deployed, under the condition of preserving old disease knowledge. In particular, updating an intelligent diagnosis system with training data of new diseases would cause catastrophic forgetting of old disease knowledge. To address the catastrophic forgetting issue, an Adapter-based Continual Learning framework called ACL is proposed to help effectively learn a set of new diseases at each round (or task) of continual learning, without changing the shared feature extractor. The learnable lightweight task-specific adapter(s) can be flexibly designed (e.g., two convolutional layers) and then added to the pretrained and fixed feature extractor. Together with a specially designed task-specific head which absorbs all previously learned old diseases as a single "out-of-distribution" category, task-specific adapter(s) can help the pretrained feature extractor more effectively extract discriminative features between diseases. In addition, a simple yet effective fine-tuning is applied to collaboratively fine-tune multiple task-specific heads such that outputs from different heads are comparable and consequently the appropriate classifier head can be more accurately selected during model inference. Extensive empirical evaluations on three image datasets demonstrate the superior performance of ACL in continual learning of new diseases. The source code is available at https://github.com/GiantJun/CL_Pytorch.
CVMar 15, 2022
SATS: Self-Attention Transfer for Continual Semantic SegmentationYiqiao Qiu, Yixing Shen, Zhuohao Sun et al.
Continually learning to segment more and more types of image regions is a desired capability for many intelligent systems. However, such continual semantic segmentation suffers from the same catastrophic forgetting issue as in continual classification learning. While multiple knowledge distillation strategies originally for continual classification have been well adapted to continual semantic segmentation, they only consider transferring old knowledge based on the outputs from one or more layers of deep fully convolutional networks. Different from existing solutions, this study proposes to transfer a new type of information relevant to knowledge, i.e. the relationships between elements (Eg. pixels or small local regions) within each image which can capture both within-class and between-class knowledge. The relationship information can be effectively obtained from the self-attention maps in a Transformer-style segmentation model. Considering that pixels belonging to the same class in each image often share similar visual properties, a class-specific region pooling is applied to provide more efficient relationship information for knowledge transfer. Extensive evaluations on multiple public benchmarks support that the proposed self-attention transfer method can further effectively alleviate the catastrophic forgetting issue, and its flexible combination with one or more widely adopted strategies significantly outperforms state-of-the-art solutions.
CLApr 17Code
TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language ModelsJinlun Ye, Jiang Liao, Runhe Lai et al.
Vision-language models (VLMs) such as CLIP exhibit strong Out-of-distribution (OOD) detection capabilities by aligning visual and textual representations. Recent CLIP-based test-time adaptation methods further improve detection performance by incorporating external OOD labels. However, such labels are finite and fixed, while the real OOD semantic space is inherently open-ended. Consequently, fixed labels fail to represent the diverse and evolving OOD semantics encountered in test streams. To address this limitation, we introduce Test-time Textual Learning (TTL), a framework that dynamically learns OOD textual semantics from unlabeled test streams, without relying on external OOD labels. TTL updates learnable prompts using pseudo-labeled test samples to capture emerging OOD knowledge. To suppress noise introduced by pseudo-labels, we introduce an OOD knowledge purification strategy that selects reliable OOD samples for adaptation while suppressing noise. In addition, TTL maintains an OOD Textual Knowledge Bank that stores high-quality textual features, providing stable score calibration across batches. Extensive experiments on two standard benchmarks with nine OOD datasets demonstrate that TTL consistently achieves state-of-the-art performance, highlighting the value of textual adaptation for robust test-time OOD detection. Our code is available at https://github.com/figec/TTL.
CVJul 11, 2022
PCCT: Progressive Class-Center Triplet Loss for Imbalanced Medical Image ClassificationKanghao Chen, Weixian Lei, Rong Zhang et al.
Imbalanced training data is a significant challenge for medical image classification. In this study, we propose a novel Progressive Class-Center Triplet (PCCT) framework to alleviate the class imbalance issue particularly for diagnosis of rare diseases, mainly by carefully designing the triplet sampling strategy and the triplet loss formation. Specifically, the PCCT framework includes two successive stages. In the first stage, PCCT trains the diagnosis system via a class-balanced triplet loss to coarsely separate distributions of different classes. In the second stage, the PCCT framework further improves the diagnosis system via a class-center involved triplet loss to cause a more compact distribution for each class. For the class-balanced triplet loss, triplets are sampled equally for each class at each training iteration, thus alleviating the imbalanced data issue. For the class-center involved triplet loss, the positive and negative samples in each triplet are replaced by their corresponding class centers, which enforces data representations of the same class closer to the class center. Furthermore, the class-center involved triplet loss is extended to the pair-wise ranking loss and the quadruplet loss, which demonstrates the generalization of the proposed framework. Extensive experiments support that the PCCT framework works effectively for medical image classification with imbalanced training images. On two skin image datasets and one chest X-ray dataset, the proposed approach respectively obtains the mean F1 score 86.2, 65.2, and 90.66 over all classes and 81.4, 63.87, and 81.92 for rare classes, achieving state-of-the-art performance and outperforming the widely used methods for the class imbalance issue.
CVJul 19, 2023
Class Attention to Regions of Lesion for Imbalanced Medical Image RecognitionJia-Xin Zhuang, Jiabin Cai, Jianguo Zhang et al.
Automated medical image classification is the key component in intelligent diagnosis systems. However, most medical image datasets contain plenty of samples of common diseases and just a handful of rare ones, leading to major class imbalances. Currently, it is an open problem in intelligent diagnosis to effectively learn from imbalanced training data. In this paper, we propose a simple yet effective framework, named \textbf{C}lass \textbf{A}ttention to \textbf{RE}gions of the lesion (CARE), to handle data imbalance issues by embedding attention into the training process of \textbf{C}onvolutional \textbf{N}eural \textbf{N}etworks (CNNs). The proposed attention module helps CNNs attend to lesion regions of rare diseases, therefore helping CNNs to learn their characteristics more effectively. In addition, this attention module works only during the training phase and does not change the architecture of the original network, so it can be directly combined with any existing CNN architecture. The CARE framework needs bounding boxes to represent the lesion regions of rare diseases. To alleviate the need for manual annotation, we further developed variants of CARE by leveraging the traditional saliency methods or a pretrained segmentation model for bounding box generation. Results show that the CARE variants with automated bounding box generation are comparable to the original CARE framework with \textit{manual} bounding box annotations. A series of experiments on an imbalanced skin image dataset and a pneumonia dataset indicates that our method can effectively help the network focus on the lesion regions of rare diseases and remarkably improves the classification performance of rare diseases.
LGMay 12Code
Instruction Lens Score: Your Instruction Contributes a Powerful Object Hallucination Detector for Multimodal Large Language ModelsRunhe Lai, Xinhua Lu, Yanqi Wu et al.
Multimodal large language models (MLLMs) have achieved remarkable progress, yet the object hallucination remains a critical challenge for reliable deployment. In this paper, we present an in-depth analysis of instruction token embeddings and reveal that they implicitly encode visual information while effectively filtering erroneous information introduced by misleading visual embeddings. Building on this insight, we propose the Instruction Lens Score (InsLen), which combines a Calibrated Local Score with a Context Consistency Score that measures context consistency of the object tokens. The proposed approach serves as a plug-and-play object hallucination detector without relying on auxiliary models or additional training. Extensive experiments across multiple benchmarks and diverse MLLM architectures demonstrate that InsLen consistently outperforms existing hallucination detection methods, highlighting its effectiveness and robustness. The code is available at https://github.com/Fraserlairh/Instruction-Lens-Score.
LGDec 7, 2022
PyGFI: Analyzing and Enhancing Robustness of Graph Neural Networks Against Hardware ErrorsRuixuan Wang, Fred Lin, Daniel Moore et al.
Graph neural networks (GNNs) have recently emerged as a promising learning paradigm in learning graph-structured data and have demonstrated wide success across various domains such as recommendation systems, social networks, and electronic design automation (EDA). Like other deep learning (DL) methods, GNNs are being deployed in sophisticated modern hardware systems, as well as dedicated accelerators. However, despite the popularity of GNNs and the recent efforts of bringing GNNs to hardware, the fault tolerance and resilience of GNNs have generally been overlooked. Inspired by the inherent algorithmic resilience of DL methods, this paper conducts, for the first time, a large-scale and empirical study of GNN resilience, aiming to understand the relationship between hardware faults and GNN accuracy. By developing a customized fault injection tool on top of PyTorch, we perform extensive fault injection experiments on various GNN models and application datasets. We observe that the error resilience of GNN models varies by orders of magnitude with respect to different models and application datasets. Further, we explore a low-cost error mitigation mechanism for GNN to enhance its resilience. This GNN resilience study aims to open up new directions and opportunities for future GNN accelerator design and architectural optimization.
CVJul 14, 2022
Learning Discriminative Representation via Metric Learning for Imbalanced Medical Image ClassificationChenghua Zeng, Huijuan Lu, Kanghao Chen et al.
Data imbalance between common and rare diseases during model training often causes intelligent diagnosis systems to have biased predictions towards common diseases. The state-of-the-art approaches apply a two-stage learning framework to alleviate the class-imbalance issue, where the first stage focuses on training of a general feature extractor and the second stage focuses on fine-tuning the classifier head for class rebalancing. However, existing two-stage approaches do not consider the fine-grained property between different diseases, often causing the first stage less effective for medical image classification than for natural image classification tasks. In this study, we propose embedding metric learning into the first stage of the two-stage framework specially to help the feature extractor learn to extract more discriminative feature representations. Extensive experiments mainly on three medical image datasets show that the proposed approach consistently outperforms existing onestage and two-stage approaches, suggesting that metric learning can be used as an effective plug-in component in the two-stage framework for fine-grained class-imbalanced image classification tasks.
CVJan 18, 2023
Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual LearningKanghao Chen, Sijia Liu, Ruixuan Wang et al.
Current deep learning models often suffer from catastrophic forgetting of old knowledge when continually learning new knowledge. Existing strategies to alleviate this issue often fix the trade-off between keeping old knowledge (stability) and learning new knowledge (plasticity). However, the stability-plasticity trade-off during continual learning may need to be dynamically changed for better model performance. In this paper, we propose two novel ways to adaptively balance model stability and plasticity. The first one is to adaptively integrate multiple levels of old knowledge and transfer it to each block level in the new model. The second one uses prediction uncertainty of old knowledge to naturally tune the importance of learning new knowledge during model training. To our best knowledge, this is the first time to connect model prediction uncertainty and knowledge distillation for continual learning. In addition, this paper applies a modified CutMix particularly to augment the data for old knowledge, further alleviating the catastrophic forgetting issue. Extensive evaluations on the CIFAR100 and the ImageNet datasets confirmed the effectiveness of the proposed method for continual learning.
NEMar 25, 2022
EnHDC: Ensemble Learning for Brain-Inspired Hyperdimensional ComputingRuixuan Wang, Dongning Ma, Xun Jiao
Ensemble learning is a classical learning method utilizing a group of weak learners to form a strong learner, which aims to increase the accuracy of the model. Recently, brain-inspired hyperdimensional computing (HDC) becomes an emerging computational paradigm that has achieved success in various domains such as human activity recognition, voice recognition, and bio-medical signal classification. HDC mimics the brain cognition and leverages high-dimensional vectors (e.g., 10000 dimensions) with fully distributed holographic representation and (pseudo-)randomness. This paper presents the first effort in exploring ensemble learning in the context of HDC and proposes the first ensemble HDC model referred to as EnHDC. EnHDC uses a majority voting-based mechanism to synergistically integrate the prediction outcomes of multiple base HDC classifiers. To enhance the diversity of base classifiers, we vary the encoding mechanisms, dimensions, and data width settings among base classifiers. By applying EnHDC on a wide range of applications, results show that the EnHDC can achieve on average 3.2\% accuracy improvement over a single HDC classifier. Further, we show that EnHDC with reduced dimensionality, e.g., 1000 dimensions, can achieve similar or even surpass the accuracy of baseline HDC with higher dimensionality, e.g., 10000 dimensions. This leads to a 20\% reduction of storage requirement of HDC model, which is key to enabling HDC on low-power computing platforms.
CVOct 27, 2023
Classifier-head Informed Feature Masking and Prototype-based Logit Smoothing for Out-of-Distribution DetectionZhuohao Sun, Yiqiao Qiu, Zhijun Tan et al.
Out-of-distribution (OOD) detection is essential when deploying neural networks in the real world. One main challenge is that neural networks often make overconfident predictions on OOD data. In this study, we propose an effective post-hoc OOD detection method based on a new feature masking strategy and a novel logit smoothing strategy. Feature masking determines the important features at the penultimate layer for each in-distribution (ID) class based on the weights of the ID class in the classifier head and masks the rest features. Logit smoothing computes the cosine similarity between the feature vector of the test sample and the prototype of the predicted ID class at the penultimate layer and uses the similarity as an adaptive temperature factor on the logit to alleviate the network's overconfidence prediction for OOD data. With these strategies, we can reduce feature activation of OOD data and enlarge the gap in OOD score between ID and OOD data. Extensive experiments on multiple standard OOD detection benchmarks demonstrate the effectiveness of our method and its compatibility with existing methods, with new state-of-the-art performance achieved from our method. The source code will be released publicly.
CVFeb 7, 2023
PAMI: partition input and aggregate outputs for model interpretationWei Shi, Wentao Zhang, Weishi Zheng et al.
There is an increasing demand for interpretation of model predictions especially in high-risk applications. Various visualization approaches have been proposed to estimate the part of input which is relevant to a specific model prediction. However, most approaches require model structure and parameter details in order to obtain the visualization results, and in general much effort is required to adapt each approach to multiple types of tasks particularly when model backbone and input format change over tasks. In this study, a simple yet effective visualization framework called PAMI is proposed based on the observation that deep learning models often aggregate features from local regions for model predictions. The basic idea is to mask majority of the input and use the corresponding model output as the relative contribution of the preserved input part to the original model prediction. For each input, since only a set of model outputs are collected and aggregated, PAMI does not require any model detail and can be applied to various prediction tasks with different model backbones and input formats. Extensive experiments on multiple tasks confirm the proposed method performs better than existing visualization approaches in more precisely finding class-specific input regions, and when applied to different model backbones and input formats. The source code will be released publicly.
CVNov 22, 2024Code
FodFoM: Fake Outlier Data by Foundation Models Creates Stronger Visual Out-of-Distribution DetectorJiankang Chen, Ling Deng, Zhiyong Gan et al.
Out-of-Distribution (OOD) detection is crucial when deploying machine learning models in open-world applications. The core challenge in OOD detection is mitigating the model's overconfidence on OOD data. While recent methods using auxiliary outlier datasets or synthesizing outlier features have shown promising OOD detection performance, they are limited due to costly data collection or simplified assumptions. In this paper, we propose a novel OOD detection framework FodFoM that innovatively combines multiple foundation models to generate two types of challenging fake outlier images for classifier training. The first type is based on BLIP-2's image captioning capability, CLIP's vision-language knowledge, and Stable Diffusion's image generation ability. Jointly utilizing these foundation models constructs fake outlier images which are semantically similar to but different from in-distribution (ID) images. For the second type, GroundingDINO's object detection ability is utilized to help construct pure background images by blurring foreground ID objects in ID images. The proposed framework can be flexibly combined with multiple existing OOD detection methods. Extensive empirical evaluations show that image classifiers with the help of constructed fake images can more accurately differentiate real OOD images from ID ones. New state-of-the-art OOD detection performance is achieved on multiple benchmarks. The code is available at \url{https://github.com/Cverchen/ACMMM2024-FodFoM}.
CVNov 22, 2024Code
TagFog: Textual Anchor Guidance and Fake Outlier Generation for Visual Out-of-Distribution DetectionJiankang Chen, Tong Zhang, Wei-Shi Zheng et al.
Out-of-distribution (OOD) detection is crucial in many real-world applications. However, intelligent models are often trained solely on in-distribution (ID) data, leading to overconfidence when misclassifying OOD data as ID classes. In this study, we propose a new learning framework which leverage simple Jigsaw-based fake OOD data and rich semantic embeddings (`anchors') from the ChatGPT description of ID knowledge to help guide the training of the image encoder. The learning framework can be flexibly combined with existing post-hoc approaches to OOD detection, and extensive empirical evaluations on multiple OOD detection benchmarks demonstrate that rich textual representation of ID knowledge and fake OOD knowledge can well help train a visual encoder for OOD detection. With the learning framework, new state-of-the-art performance was achieved on all the benchmarks. The code is available at \url{https://github.com/Cverchen/TagFog}.
CVApr 20
Concept-wise Attention for Fine-grained Concept Bottleneck ModelsMinghong Zhong, Guoshuai Zou, Kanghao Chen et al.
Recently impressive performance has been achieved in Concept Bottleneck Models (CBM) by utilizing the image-text alignment learned by a large pre-trained vision-language model (i.e. CLIP). However, there exist two key limitations in concept modeling. Existing methods often suffer from pre-training biases, manifested as granularity misalignment or reliance on structural priors. Moreover, fine-tuning with Binary Cross-Entropy (BCE) loss treats each concept independently, which ignores mutual exclusivity among concepts, leading to suboptimal alignment. To address these limitations, we propose Concept-wise Attention for Fine-grained Concept Bottleneck Models (CoAt-CBM), a novel framework that achieves adaptive fine-grained image-concept alignment and high interpretability. Specifically, CoAt-CBM employs learnable concept-wise visual queries to adaptively obtain fine-grained concept-wise visual embeddings, which are then used to produce a concept score vector. Then, a novel concept contrastive optimization guides the model to handle the relative importance of the concept scores, enabling concept predictions to faithfully reflect the image content and improved alignment. Extensive experiments demonstrate that CoAt-CBM consistently outperforms state-of-the-art methods. The codes will be available upon acceptance.
CVAug 7, 2025Code
Decoupling Continual Semantic SegmentationYifu Guo, Yuquan Lu, Wentao Zhang et al.
Continual Semantic Segmentation (CSS) requires learning new classes without forgetting previously acquired knowledge, addressing the fundamental challenge of catastrophic forgetting in dense prediction tasks. However, existing CSS methods typically employ single-stage encoder-decoder architectures where segmentation masks and class labels are tightly coupled, leading to interference between old and new class learning and suboptimal retention-plasticity balance. We introduce DecoupleCSS, a novel two-stage framework for CSS. By decoupling class-aware detection from class-agnostic segmentation, DecoupleCSS enables more effective continual learning, preserving past knowledge while learning new classes. The first stage leverages pre-trained text and image encoders, adapted using LoRA, to encode class-specific information and generate location-aware prompts. In the second stage, the Segment Anything Model (SAM) is employed to produce precise segmentation masks, ensuring that segmentation knowledge is shared across both new and previous classes. This approach improves the balance between retention and adaptability in CSS, achieving state-of-the-art performance across a variety of challenging tasks. Our code is publicly available at: https://github.com/euyis1019/Decoupling-Continual-Semantic-Segmentation.
CVJul 6, 2025Code
FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution DetectionXinhua Lu, Runhe Lai, Yanqi Wu et al.
Pre-trained vision-language models (VLMs) have advanced out-of-distribution (OOD) detection recently. However, existing CLIP-based methods often focus on learning OOD-related knowledge to improve OOD detection, showing limited generalization or reliance on external large-scale auxiliary datasets. In this study, instead of delving into the intricate OOD-related knowledge, we propose an innovative CLIP-based framework based on Forced prompt leArning (FA), designed to make full use of the In-Distribution (ID) knowledge and ultimately boost the effectiveness of OOD detection. Our key insight is to learn a prompt (i.e., forced prompt) that contains more diversified and richer descriptions of the ID classes beyond the textual semantics of class labels. Specifically, it promotes better discernment for ID images, by forcing more notable semantic similarity between ID images and the learnable forced prompt. Moreover, we introduce a forced coefficient, encouraging the forced prompt to learn more comprehensive and nuanced descriptions of the ID classes. In this way, FA is capable of achieving notable improvements in OOD detection, even when trained without any external auxiliary datasets, while maintaining an identical number of trainable parameters as CoOp. Extensive empirical evaluations confirm our method consistently outperforms current state-of-the-art methods. Code is available at https://github.com/0xFAFA/FA.
CVAug 25, 2025Code
Hierarchical Vision-Language Learning for Medical Out-of-Distribution DetectionRunhe Lai, Xinhua Lu, Kanghao Chen et al.
In trustworthy medical diagnosis systems, integrating out-of-distribution (OOD) detection aims to identify unknown diseases in samples, thereby mitigating the risk of misdiagnosis. In this study, we propose a novel OOD detection framework based on vision-language models (VLMs), which integrates hierarchical visual information to cope with challenging unknown diseases that resemble known diseases. Specifically, a cross-scale visual fusion strategy is proposed to couple visual embeddings from multiple scales. This enriches the detailed representation of medical images and thus improves the discrimination of unknown diseases. Moreover, a cross-scale hard pseudo-OOD sample generation strategy is proposed to benefit OOD detection maximally. Experimental evaluations on three public medical datasets support that the proposed framework achieves superior OOD detection performance compared to existing methods. The source code is available at https://openi.pcl.ac.cn/OpenMedIA/HVL.
CVMar 14
Zero-Forgetting CISS via Dual-Phase Cognitive CascadesYuquan Lu, Yifu Guo, Zishan Xu et al.
Continual semantic segmentation (CSS) is a cornerstone task in computer vision that enables a large number of downstream applications, but faces the catastrophic forgetting challenge. In conventional class-incremental semantic segmentation (CISS) frameworks using Softmax-based classification heads, catastrophic forgetting originates from Catastrophic forgetting and task affiliation probability. We formulate these problems and provide a theoretical analysis to more deeply understand the limitations in existing CISS methods, particularly Strict Parameter Isolation (SPI). To address these challenges, we follow a dual-phase intuition from human annotators, and introduce Cognitive Cascade Segmentation (CogCaS), a novel dual-phase cascade formulation for CSS tasks in the CISS setting. By decoupling the task into class-existence detection and class-specific segmentation, CogCaS enables more effective continual learning, preserving previously learned knowledge while incorporating new classes. Using two benchmark datasets PASCAL VOC 2012 and ADE20K, we have shown significant improvements in a variety of challenging scenarios, particularly those with long sequence of incremental tasks, when compared to exsiting state-of-the-art methods. Our code will be made publicly available upon paper acceptance.
CVDec 10, 2025
Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language ModelJiantao Tan, Peixian Ma, Tong Yu et al.
Class-incremental learning requires a learning system to continually learn knowledge of new classes and meanwhile try to preserve previously learned knowledge of old classes. As current state-of-the-art methods based on Vision-Language Models (VLMs) still suffer from the issue of differentiating classes across learning tasks. Here a novel VLM-based continual learning framework for image classification is proposed. In this framework, task-specific adapters are added to the pre-trained and frozen image encoder to learn new knowledge, and a novel cross-task representation calibration strategy based on a mixture of light-weight projectors is used to help better separate all learned classes in a unified feature space, alleviating class confusion across tasks. In addition, a novel inference strategy guided by prediction uncertainty is developed to more accurately select the most appropriate image feature for class prediction. Extensive experiments on multiple datasets under various settings demonstrate the superior performance of our method compared to existing ones.
CVApr 26
DynProto: Dynamic Prototype Evolution for Out-of-Distribution DetectionYanqi Wu, Xinhua Lu, Runhe Lai et al.
Recent studies show that using potential out-of-distribution (OOD) labels from large corpora as auxiliary information can improve OOD detection in vision-language models (VLMs). However, these methods often fail when real-world OOD samples fall outside the predefined OOD label set. To address this limitation, we propose DynProto, a novel approach that learns OOD prototypes dynamically during testing using only in-distribution (ID) information. DynProto is inspired by a key observation: OOD samples predicted as the same ID class tend to cluster in the feature space. With this insight, we leverage easy-to-detect OOD samples as ``anchors'' to find their harder-to-detect, similar counterparts. To this end, DynProto introduces two modules: \textbf{Coarse OOD Pattern Capturing Module} caches OOD patterns that are easily confused with each ID class during testing, and \textbf{Fine-grained OOD Pattern Refinement Module} subsequently clusters these patterns within each cache and aggregates them into representative OOD prototypes. By measuring similarity to ID and dynamic OOD prototypes, DynProto enables accurate OOD detection. DynProto significantly outperforms prior methods across multiple benchmarks. On ImageNet OOD benchmark, DynProto reduces FPR95 by 11.60\% and improves AUROC by 4.70\%. Moreover, the framework is architecture-agnostic and can be integrated into various backbones.
CVFeb 28, 2024
Generalizable Two-Branch Framework for Image Class-Incremental LearningChao Wu, Xiaobin Chang, Ruixuan Wang
Deep neural networks often severely forget previously learned knowledge when learning new knowledge. Various continual learning (CL) methods have been proposed to handle such a catastrophic forgetting issue from different perspectives and achieved substantial improvements. In this paper, a novel two-branch continual learning framework is proposed to further enhance most existing CL methods. Specifically, the main branch can be any existing CL model and the newly introduced side branch is a lightweight convolutional network. The output of each main branch block is modulated by the output of the corresponding side branch block. Such a simple two-branch model can then be easily implemented and learned with the vanilla optimization setting without whistles and bells. Extensive experiments with various settings on multiple image datasets show that the proposed framework yields consistent improvements over state-of-the-art methods.
LGNov 1, 2024
Class Incremental Learning with Task-Specific Batch Normalization and Out-of-Distribution DetectionXuchen Xie, Yiqiao Qiu, Run Lin et al.
This study focuses on incremental learning for image classification, exploring how to reduce catastrophic forgetting of all learned knowledge when access to old data is restricted due to memory or privacy constraints. The challenge of incremental learning lies in achieving an optimal balance between plasticity, the ability to learn new knowledge, and stability, the ability to retain old knowledge. Based on whether the task identifier (task-ID) of an image can be obtained during the test stage, incremental learning for image classifcation is divided into two main paradigms, which are task incremental learning (TIL) and class incremental learning (CIL). The TIL paradigm has access to the task-ID, allowing it to use multiple task-specific classification heads selected based on the task-ID. Consequently, in CIL, where the task-ID is unavailable, TIL methods must predict the task-ID to extend their application to the CIL paradigm. Our previous method for TIL adds task-specific batch normalization and classification heads incrementally. This work extends the method by predicting task-ID through an "unknown" class added to each classification head. The head with the lowest "unknown" probability is selected, enabling task-ID prediction and making the method applicable to CIL. The task-specific batch normalization (BN) modules effectively adjust the distribution of output feature maps across different tasks, enhancing the model's plasticity.Moreover, since BN has much fewer parameters compared to convolutional kernels, by only modifying the BN layers as new tasks arrive, the model can effectively manage parameter growth while ensuring stability across tasks. The innovation of this study lies in the first-time introduction of task-specific BN into CIL and verifying the feasibility of extending TIL methods to CIL through task-ID prediction with state-of-the-art performance on multiple datasets.
CVFeb 6, 2024
Intensive Vision-guided Network for Radiology Report GenerationFudan Zheng, Mengfei Li, Ying Wang et al.
Automatic radiology report generation is booming due to its huge application potential for the healthcare industry. However, existing computer vision and natural language processing approaches to tackle this problem are limited in two aspects. First, when extracting image features, most of them neglect multi-view reasoning in vision and model single-view structure of medical images, such as space-view or channel-view. However, clinicians rely on multi-view imaging information for comprehensive judgment in daily clinical diagnosis. Second, when generating reports, they overlook context reasoning with multi-modal information and focus on pure textual optimization utilizing retrieval-based methods. We aim to address these two issues by proposing a model that better simulates clinicians' perspectives and generates more accurate reports. Given the above limitation in feature extraction, we propose a Globally-intensive Attention (GIA) module in the medical image encoder to simulate and integrate multi-view vision perception. GIA aims to learn three types of vision perception: depth view, space view, and pixel view. On the other hand, to address the above problem in report generation, we explore how to involve multi-modal signals to generate precisely matched reports, i.e., how to integrate previously predicted words with region-aware visual content in next word prediction. Specifically, we design a Visual Knowledge-guided Decoder (VKGD), which can adaptively consider how much the model needs to rely on visual information and previously predicted text to assist next word prediction. Hence, our final Intensive Vision-guided Network (IVGN) framework includes a GIA-guided Visual Encoder and the VKGD. Experiments on two commonly-used datasets IU X-Ray and MIMIC-CXR demonstrate the superior ability of our method compared with other state-of-the-art approaches.
CVDec 25, 2023
A Target Detection Algorithm in Traffic Scenes Based on Deep Reinforcement LearningXinyu Ren, Ruixuan Wang
This research presents a novel active detection model utilizing deep reinforcement learning to accurately detect traffic objects in real-world scenarios. The model employs a deep Q-network based on LSTM-CNN that identifies and aligns target zones with specific categories of traffic objects through implementing a top-down approach with efficient feature extraction of the environment. The model integrates historical and current actions and observations to make a comprehensive analysis. The design of the state space and reward function takes into account the impact of time steps to enable the model to complete the task in fewer steps. Tests conducted demonstrate the model's proficiency, exhibiting exceptional precision and performance in locating traffic signal lights and speed limit signs. The findings of this study highlight the efficacy and potential of the deep reinforcement learning-based active detection model in traffic-related applications, underscoring its robust detection abilities and promising performance.
CVOct 14, 2025
Local Background Features Matter in Out-of-Distribution DetectionJinlun Ye, Zhuohao Sun, Yiqiao Qiu et al.
Out-of-distribution (OOD) detection is crucial when deploying deep neural networks in the real world to ensure the reliability and safety of their applications. One main challenge in OOD detection is that neural network models often produce overconfident predictions on OOD data. While some methods using auxiliary OOD datasets or generating fake OOD images have shown promising OOD detection performance, they are limited by the high costs of data collection and training. In this study, we propose a novel and effective OOD detection method that utilizes local background features as fake OOD features for model training. Inspired by the observation that OOD images generally share similar background regions with ID images, the background features are extracted from ID images as simulated OOD visual representations during training based on the local invariance of convolution. Through being optimized to reduce the $L_2$-norm of these background features, the neural networks are able to alleviate the overconfidence issue on OOD data. Extensive experiments on multiple standard OOD detection benchmarks confirm the effectiveness of our method and its wide combinatorial compatibility with existing post-hoc methods, with new state-of-the-art performance achieved from our method.
CVAug 18, 2025
Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot LearningDexia Chen, Qianjie Zhu, Weibing Li et al.
Pretrained vision-language models (VLMs), such as CLIP, have shown remarkable potential in few-shot image classification and led to numerous effective transfer learning strategies. These methods leverage the pretrained knowledge of VLMs to enable effective domain adaptation while mitigating overfitting through parameter-efficient tuning or instance-based consistency constraints. However, such regularizations often neglect the geometric structure of data distribution, which may lead to distortion of the overall semantic representation. To overcome this limitation, we propose a novel fine-tuning method, Manifold-Preserving and Sculpting Tuning (MPS-Tuning). Regarding the data distribution in feature space as a semantic manifold, MPS-Tuning explicitly constrains the intrinsic geometry of this manifold while further sculpting it to enhance class separability. Specifically, MPS-Tuning preserves both macroscopic and microscopic topological structures of the original manifold by aligning Gram matrices of features before and after fine-tuning. Theoretically, this constraint is shown to approximate an upper bound of the Gromov-Wasserstein distance. Furthermore, features from the image and text modalities are paired, and pairwise similarities are optimized to enhance the manifold's class discriminability. Extensive experiments demonstrate that MPS-Tuning significantly improves model performance while effectively preserving the structure of the semantic manifold. The code will be released.
CVAug 18, 2025
Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language ModelsDexia Chen, Wentao Zhang, Qianjie Zhu et al.
Vision-language models (VLMs) pre-trained on natural image and language data, such as CLIP, have exhibited significant potential in few-shot image recognition tasks, leading to development of various efficient transfer learning methods. These methods exploit inherent pre-learned knowledge in VLMs and have achieved strong performance on standard image datasets. However, their effectiveness is often limited when confronted with cross-domain tasks where imaging domains differ from natural images. To address this limitation, we propose Consistency-guided Multi-view Collaborative Optimization (CoMuCo), a novel fine-tuning strategy for VLMs. This strategy employs two functionally complementary expert modules to extract multi-view features, while incorporating prior knowledge-based consistency constraints and information geometry-based consensus mechanisms to enhance the robustness of feature learning. Additionally, a new cross-domain few-shot benchmark is established to help comprehensively evaluate methods on imaging domains distinct from natural images. Extensive empirical evaluations on both existing and newly proposed benchmarks suggest CoMuCo consistently outperforms current methods in few-shot tasks. The code and benchmark will be released.
CVAug 5, 2025
Augmenting Continual Learning of Diseases with LLM-Generated Visual ConceptsJiantao Tan, Peixian Ma, Kanghao Chen et al.
Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes. However, while existing approaches do utilize textual modality information, they solely rely on simplistic templates with a class name, thereby neglecting richer semantic information. To address these limitations, we propose a novel framework that harnesses visual concepts generated by large language models (LLMs) as discriminative semantic guidance. Our method dynamically constructs a visual concept pool with a similarity-based filtering mechanism to prevent redundancy. Then, to integrate the concepts into the continual learning process, we employ a cross-modal image-concept attention module, coupled with an attention loss. Through attention, the module can leverage the semantic knowledge from relevant visual concepts and produce class-representative fused features for classification. Experiments on medical and natural image datasets show our method achieves state-of-the-art performance, demonstrating the effectiveness and superiority of our method. We will release the code publicly.
LGFeb 11, 2022
Automated Architecture Search for Brain-inspired Hyperdimensional ComputingJunhuan Yang, Yi Sheng, Sizhe Zhang et al.
This paper represents the first effort to explore an automated architecture search for hyperdimensional computing (HDC), a type of brain-inspired neural network. Currently, HDC design is largely carried out in an application-specific ad-hoc manner, which significantly limits its application. Furthermore, the approach leads to inferior accuracy and efficiency, which suggests that HDC cannot perform competitively against deep neural networks. Herein, we present a thorough study to formulate an HDC architecture search space. On top of the search space, we apply reinforcement-learning to automatically explore the HDC architectures. The searched HDC architectures show competitive performance on case studies involving a drug discovery dataset and a language recognition task. On the Clintox dataset, which tries to learn features from developed drugs that passed/failed clinical trials for toxicity reasons, the searched HDC architecture obtains the state-of-the-art ROC-AUC scores, which are 0.80% higher than the manually designed HDC and 9.75% higher than conventional neural networks. Similar results are achieved on the language recognition task, with 1.27% higher performance than conventional methods.
CVAug 25, 2021
Understanding of Kernels in CNN Models by Suppressing Irrelevant Visual Features in ImagesJia-Xin Zhuang, Wanying Tao, Jianfei Xing et al.
Deep learning models have shown their superior performance in various vision tasks. However, the lack of precisely interpreting kernels in convolutional neural networks (CNNs) is becoming one main obstacle to wide applications of deep learning models in real scenarios. Although existing interpretation methods may find certain visual patterns which are associated with the activation of a specific kernel, those visual patterns may not be specific or comprehensive enough for interpretation of a specific activation of kernel of interest. In this paper, a simple yet effective optimization method is proposed to interpret the activation of any kernel of interest in CNN models. The basic idea is to simultaneously preserve the activation of the specific kernel and suppress the activation of all other kernels at the same layer. In this way, only visual information relevant to the activation of the specific kernel is remained in the input. Consistent visual information from multiple modified inputs would help users understand what kind of features are specifically associated with specific kernel. Comprehensive evaluation shows that the proposed method can help better interpret activation of specific kernels than widely used methods, even when two kernels have very similar activation regions from the same input image.
CVAug 11, 2021
Discriminative Distillation to Reduce Class Confusion in Continual LearningChanghong Zhong, Zhiying Cui, Ruixuan Wang et al.
Successful continual learning of new knowledge would enable intelligent systems to recognize more and more classes of objects. However, current intelligent systems often fail to correctly recognize previously learned classes of objects when updated to learn new classes. It is widely believed that such downgraded performance is solely due to the catastrophic forgetting of previously learned knowledge. In this study, we argue that the class confusion phenomena may also play a role in downgrading the classification performance during continual learning, i.e., the high similarity between new classes and any previously learned classes would also cause the classifier to make mistakes in recognizing these old classes, even if the knowledge of these old classes is not forgotten. To alleviate the class confusion issue, we propose a discriminative distillation strategy to help the classify well learn the discriminative features between confusing classes during continual learning. Experiments on multiple natural image classification tasks support that the proposed distillation strategy, when combined with existing methods, is effective in further improving continual learning.
CVApr 28, 2021
Preserving Earlier Knowledge in Continual Learning with the Help of All Previous Feature ExtractorsZhuoyun Li, Changhong Zhong, Sijia Liu et al.
Continual learning of new knowledge over time is one desirable capability for intelligent systems to recognize more and more classes of objects. Without or with very limited amount of old data stored, an intelligent system often catastrophically forgets previously learned old knowledge when learning new knowledge. Recently, various approaches have been proposed to alleviate the catastrophic forgetting issue. However, old knowledge learned earlier is commonly less preserved than that learned more recently. In order to reduce the forgetting of particularly earlier learned old knowledge and improve the overall continual learning performance, we propose a simple yet effective fusion mechanism by including all the previously learned feature extractors into the intelligent model. In addition, a new feature extractor is included to the model when learning a new set of classes each time, and a feature extractor pruning is also applied to prevent the whole model size from growing rapidly. Experiments on multiple classification tasks show that the proposed approach can effectively reduce the forgetting of old knowledge, achieving state-of-the-art continual learning performance.
IVMar 1, 2021
Towards Unbiased COVID-19 Lesion Localisation and Segmentation via Weakly Supervised LearningYang Yang, Jiancong Chen, Ruixuan Wang et al.
Despite tremendous efforts, it is very challenging to generate a robust model to assist in the accurate quantification assessment of COVID-19 on chest CT images. Due to the nature of blurred boundaries, the supervised segmentation methods usually suffer from annotation biases. To support unbiased lesion localisation and to minimise the labeling costs, we propose a data-driven framework supervised by only image-level labels. The framework can explicitly separate potential lesions from original images, with the help of a generative adversarial network and a lesion-specific decoder. Experiments on two COVID-19 datasets demonstrate the effectiveness of the proposed framework and its superior performance to several existing methods.
CVNov 5, 2018
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS ChallengeSpyridon Bakas, Mauricio Reyes, Andras Jakab et al.
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
CVFeb 14, 2018
Fully Convolutional Network Ensembles for White Matter Hyperintensities Segmentation in MR ImagesHongwei Li, Gongfa Jiang, Jianguo Zhang et al.
White matter hyperintensities (WMH) are commonly found in the brains of healthy elderly individuals and have been associated with various neurological and geriatric disorders. In this paper, we present a study using deep fully convolutional network and ensemble models to automatically detect such WMH using fluid attenuation inversion recovery (FLAIR) and T1 magnetic resonance (MR) scans. The algorithm was evaluated and ranked 1 st in the WMH Segmentation Challenge at MICCAI 2017. In the evaluation stage, the implementation of the algorithm was submitted to the challenge organizers, who then independently tested it on a hidden set of 110 cases from 5 scanners. Averaged dice score, precision and robust Hausdorff distance obtained on held-out test datasets were 80%, 84% and 6.30mm respectively. These were the highest achieved in the challenge, suggesting the proposed method is the state-of-the-art. In this paper, we provide detailed descriptions and quantitative analysis on key components of the system. Furthermore, a study of cross-scanner evaluation is presented to discuss how the combination of modalities and data augmentation affect the generalization capability of the system. The adaptability of the system to different scanners and protocols is also investigated. A quantitative study is further presented to test the effect of ensemble size. Additionally, software and models of our method are made publicly available. The effectiveness and generalization capability of the proposed system show its potential for real-world clinical practice.