Gan Sun

CV
h-index23
29papers
1,316citations
Novelty52%
AI Score52

29 Papers

CVFeb 2, 2023Code
No One Left Behind: Real-World Federated Class-Incremental Learning

Jiahua Dong, Hongliu Li, Yang Cong et al. · eth-zurich

Federated learning (FL) is a hot collaborative training framework via aggregating model parameters of decentralized local clients. However, most FL methods unreasonably assume data categories of FL framework are known and fixed in advance. Moreover, some new local clients that collect novel categories unseen by other clients may be introduced to FL training irregularly. These issues render global model to undergo catastrophic forgetting on old categories, when local clients receive new categories consecutively under limited memory of storing old categories. To tackle the above issues, we propose a novel Local-Global Anti-forgetting (LGA) model. It ensures no local clients are left behind as they learn new classes continually, by addressing local and global catastrophic forgetting. Specifically, considering tackling class imbalance of local client to surmount local forgetting, we develop a category-balanced gradient-adaptive compensation loss and a category gradient-induced semantic distillation loss. They can balance heterogeneous forgetting speeds of hard-to-forget and easy-to-forget old categories, while ensure consistent class-relations within different tasks. Moreover, a proxy server is designed to tackle global forgetting caused by Non-IID class imbalance between different clients. It augments perturbed prototype images of new categories collected from local clients via self-supervised prototype augmentation, thus improving robustness to choose the best old global model for local-side semantic distillation loss. Experiments on representative datasets verify superior performance of our model against comparison methods. The code is available at https://github.com/JiahuaDong/LGA.

CVAug 7, 2023Code
Heterogeneous Forgetting Compensation for Class-Incremental Learning

Jiahua Dong, Wenqi Liang, Yang Cong et al.

Class-incremental learning (CIL) has achieved remarkable successes in learning new classes consecutively while overcoming catastrophic forgetting on old categories. However, most existing CIL methods unreasonably assume that all old categories have the same forgetting pace, and neglect negative influence of forgetting heterogeneity among different old classes on forgetting compensation. To surmount the above challenges, we develop a novel Heterogeneous Forgetting Compensation (HFC) model, which can resolve heterogeneous forgetting of easy-to-forget and hard-to-forget old categories from both representation and gradient aspects. Specifically, we design a task-semantic aggregation block to alleviate heterogeneous forgetting from representation aspect. It aggregates local category information within each task to learn task-shared global representations. Moreover, we develop two novel plug-and-play losses: a gradient-balanced forgetting compensation loss and a gradient-balanced relation distillation loss to alleviate forgetting from gradient aspect. They consider gradient-balanced compensation to rectify forgetting heterogeneity of old categories and heterogeneous relation consistency. Experiments on several representative datasets illustrate effectiveness of our HFC model. The code is available at https://github.com/JiahuaDong/HFC.

LGMar 22, 2022
Federated Class-Incremental Learning

Jiahua Dong, Lixu Wang, Zhen Fang et al.

Federated learning (FL) has attracted growing attention via data-private collaborative training on decentralized clients. However, most existing methods unrealistically assume object classes of the overall framework are fixed over time. It makes the global model suffer from significant catastrophic forgetting on old classes in real-world scenarios, where local clients often collect new classes continuously and have very limited storage memory to store old classes. Moreover, new clients with unseen new classes may participate in the FL training, further aggravating the catastrophic forgetting of the global model. To address these challenges, we develop a novel Global-Local Forgetting Compensation (GLFC) model, to learn a global class incremental model for alleviating the catastrophic forgetting from both local and global perspectives. Specifically, to address local forgetting caused by class imbalance at the local clients, we design a class-aware gradient compensation loss and a class-semantic relation distillation loss to balance the forgetting of old classes and distill consistent inter-class relations across tasks. To tackle the global forgetting brought by the non-i.i.d class imbalance across clients, we propose a proxy server that selects the best old global model to assist the local relation distillation. Moreover, a prototype gradient-based communication mechanism is developed to protect privacy. Our model outperforms state-of-the-art methods by 4.4%-15.1% in terms of average accuracy on representative benchmark datasets.

CVSep 8, 2023
Create Your World: Lifelong Text-to-Image Diffusion

Gan Sun, Wenqi Liang, Jiahua Dong et al.

Text-to-image generative models can produce diverse high-quality images of concepts with a text prompt, which have demonstrated excellent ability in image generation, image translation, etc. We in this work study the problem of synthesizing instantiations of a use's own concepts in a never-ending manner, i.e., create your world, where the new concepts from user are quickly learned with a few examples. To achieve this goal, we propose a Lifelong text-to-image Diffusion Model (L2DM), which intends to overcome knowledge "catastrophic forgetting" for the past encountered concepts, and semantic "catastrophic neglecting" for one or more concepts in the text prompt. In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module, which could respectively safeguard the knowledge of both prior concepts and each past personalized concept. When generating images with a user text prompt, the solution to semantic "catastrophic neglecting" is that a concept attention artist module can alleviate the semantic neglecting from concept aspect, and an orthogonal attention module can reduce the semantic binding from attribute aspect. To the end, our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics, when comparing with the related state-of-the-art models. The code will be released at https://wenqiliang.github.io/.

CVFeb 20, 2023
InOR-Net: Incremental 3D Object Recognition Network for Point Cloud Representation

Jiahua Dong, Yang Cong, Gan Sun et al.

3D object recognition has successfully become an appealing research topic in the real-world. However, most existing recognition models unreasonably assume that the categories of 3D objects cannot change over time in the real-world. This unrealistic assumption may result in significant performance degradation for them to learn new classes of 3D objects consecutively, due to the catastrophic forgetting on old learned classes. Moreover, they cannot explore which 3D geometric characteristics are essential to alleviate the catastrophic forgetting on old classes of 3D objects. To tackle the above challenges, we develop a novel Incremental 3D Object Recognition Network (i.e., InOR-Net), which could recognize new classes of 3D objects continuously via overcoming the catastrophic forgetting on old classes. Specifically, a category-guided geometric reasoning is proposed to reason local geometric structures with distinctive 3D characteristics of each class by leveraging intrinsic category information. We then propose a novel critic-induced geometric attention mechanism to distinguish which 3D geometric characteristics within each class are beneficial to overcome the catastrophic forgetting on old classes of 3D objects, while preventing the negative influence of useless 3D characteristics. In addition, a dual adaptive fairness compensations strategy is designed to overcome the forgetting brought by class imbalance, by compensating biased weights and predictions of the classifier. Comparison experiments verify the state-of-the-art performance of the proposed InOR-Net model on several public point cloud datasets.

CVNov 3, 2025Code
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model

Wenqi Liang, Gan Sun, Yao He et al.

Vision-Language-Action models (VLAs) are emerging as powerful tools for learning generalizable visuomotor control policies. However, current VLAs are mostly trained on large-scale image-text-action data and remain limited in two key ways: (i) they struggle with pixel-level scene understanding, and (ii) they rely heavily on textual prompts, which reduces their flexibility in real-world settings. To address these challenges, we introduce PixelVLA, the first VLA model designed to support both pixel-level reasoning and multimodal prompting with text and visual inputs. Our approach is built on a new visuomotor instruction tuning framework that integrates a multiscale pixel-aware encoder with a visual prompting encoder. To train PixelVLA effectively, we further propose a two-stage automated annotation pipeline that generates Pixel-160K, a large-scale dataset with pixel-level annotations derived from existing robot data. Experiments on three standard VLA benchmarks and two VLA model variants show that PixelVLA improves manipulation success rates by 10.1%-17.8% over OpenVLA, while requiring only 1.5% of its pretraining cost. These results demonstrate that PixelVLA can be integrated into existing VLAs to enable more accurate, efficient, and versatile robot control in complex environments. The dataset and code will be released as open source.

CVJul 20, 2023
Gradient-Semantic Compensation for Incremental Semantic Segmentation

Wei Cong, Yang Cong, Jiahua Dong et al.

Incremental semantic segmentation aims to continually learn the segmentation of new coming classes without accessing the training data of previously learned classes. However, most current methods fail to address catastrophic forgetting and background shift since they 1) treat all previous classes equally without considering different forgetting paces caused by imbalanced gradient back-propagation; 2) lack strong semantic guidance between classes. To tackle the above challenges, in this paper, we propose a Gradient-Semantic Compensation (GSC) model, which surmounts incremental semantic segmentation from both gradient and semantic perspectives. Specifically, to address catastrophic forgetting from the gradient aspect, we develop a step-aware gradient compensation that can balance forgetting paces of previously seen classes via re-weighting gradient backpropagation. Meanwhile, we propose a soft-sharp semantic relation distillation to distill consistent inter-class semantic relations via soft labels for alleviating catastrophic forgetting from the semantic aspect. In addition, we develop a prototypical pseudo re-labeling that provides strong semantic guidance to mitigate background shift. It produces high-quality pseudo labels for old classes in the background by measuring distances between pixels and class-wise prototypes. Extensive experiments on three public datasets, i.e., Pascal VOC 2012, ADE20K, and Cityscapes, demonstrate the effectiveness of our proposed GSC model.

LGJul 20, 2023
Self-paced Weight Consolidation for Continual Learning

Wei Cong, Yang Cong, Gan Sun et al.

Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the number of tasks, since most existing algorithms need to regularize all previous tasks when learning new tasks. To address the above challenges, we propose a self-paced Weight Consolidation (spWC) framework to attain robust continual learning via evaluating the discriminative contributions of previous tasks. To be specific, we develop a self-paced regularization to reflect the priorities of past tasks via measuring difficulty based on key performance indicator (i.e., accuracy). When encountering a new task, all previous tasks are sorted from "difficult" to "easy" based on the priorities. Then the parameters of the new continual learner will be learned via selectively maintaining the knowledge amongst more difficult past tasks, which could well overcome catastrophic forgetting with less computational cost. We adopt an alternative convex search to iteratively update the model parameters and priority weights in the bi-convex formulation. The proposed spWC framework is plug-and-play, which is applicable to most continual learning algorithms (e.g., EWC, MAS and RCIL) in different directions (e.g., classification and segmentation). Experimental results on several public benchmark datasets demonstrate that our proposed framework can effectively improve performance when compared with other popular continual learning algorithms.

CVAug 24, 2023
I3DOD: Towards Incremental 3D Object Detection via Prompting

Wenqi Liang, Gan Sun, Chenxi Liu et al.

3D object detection has achieved significant performance in many fields, e.g., robotics system, autonomous driving, and augmented reality. However, most existing methods could cause catastrophic forgetting of old classes when performing on the class-incremental scenarios. Meanwhile, the current class-incremental 3D object detection methods neglect the relationships between the object localization information and category semantic information and assume all the knowledge of old model is reliable. To address the above challenge, we present a novel Incremental 3D Object Detection framework with the guidance of prompting, i.e., I3DOD. Specifically, we propose a task-shared prompts mechanism to learn the matching relationships between the object localization information and category semantic information. After training on the current task, these prompts will be stored in our prompt pool, and perform the relationship of old classes in the next task. Moreover, we design a reliable distillation strategy to transfer knowledge from two aspects: a reliable dynamic distillation is developed to filter out the negative knowledge and transfer the reliable 3D knowledge to new detection model; the relation feature is proposed to capture the responses relation in feature space and protect plasticity of the model when learning novel 3D classes. To the end, we conduct comprehensive experiments on two benchmark datasets and our method outperforms the state-of-the-art object detection methods by 0.6% - 2.7% in terms of mAP@0.25.

CVJul 12, 2024
Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation

Wei Cong, Yang Cong, Yuyang Liu et al.

Incremental semantic segmentation endeavors to segment newly encountered classes while maintaining knowledge of old classes. However, existing methods either 1) lack guidance from class-specific knowledge (i.e., old class prototypes), leading to a bias towards new classes, or 2) constrain class-shared knowledge (i.e., old model weights) excessively without discrimination, resulting in a preference for old classes. In this paper, to trade off model performance, we propose the Class-specific and Class-shared Knowledge (Cs2K) guidance for incremental semantic segmentation. Specifically, from the class-specific knowledge aspect, we design a prototype-guided pseudo labeling that exploits feature proximity from prototypes to correct pseudo labels, thereby overcoming catastrophic forgetting. Meanwhile, we develop a prototype-guided class adaptation that aligns class distribution across datasets via learning old augmented prototypes. Moreover, from the class-shared knowledge aspect, we propose a weight-guided selective consolidation to strengthen old memory while maintaining new memory by integrating old and new model weights based on weight importance relative to old classes. Experiments on public datasets demonstrate that our proposed Cs2K significantly improves segmentation performance and is plug-and-play.

31.3CVApr 17
Continual Hand-Eye Calibration for Open-world Robotic Manipulation

Fazeng Li, Gan Sun, Chenxi Liu et al.

Hand-eye calibration through visual localization is a critical capability for robotic manipulation in open-world environments. However, most deep learning-based calibration models suffer from catastrophic forgetting when adapting into unseen data amongst open-world scene changes, while simple rehearsal-based continual learning strategy cannot well mitigate this issue. To overcome this challenge, we propose a continual hand-eye calibration framework, enabling robots to adapt to sequentially encountered open-world manipulation scenes through spatially replay strategy and structure-preserving distillation. Specifically, a Spatial-Aware Replay Strategy (SARS) constructs a geometrically uniform replay buffer that ensures comprehensive coverage of each scene pose space, replacing redundant adjacent frames with maximally informative viewpoints. Meanwhile, a Structure-Preserving Dual Distillation (SPDD) is proposed to decompose localization knowledge into coarse scene layout and fine pose precision, and distills them separately to alleviate both types of forgetting during continual adaptation. As a new manipulation scene arrives, SARS provides geometrically representative replay samples from all prior scenes, and SPDD applies structured distillation on these samples to retain previously learned knowledge. After training on the new scene, SARS incorporates selected samples from the new scene into the replay buffer for future rehearsal, allowing the model to continuously accumulate multi-scene calibration capability. Experiments on multiple public datasets show significant anti scene forgetting performance, maintaining accuracy on past scenes while preserving adaptation to new scenes, confirming the effectiveness of the framework.

LGDec 28, 2025
Federated Multi-Task Clustering

Suyan Dai, Gan Sun, Fazeng Li et al.

Spectral clustering has emerged as one of the most effective clustering algorithms due to its superior performance. However, most existing models are designed for centralized settings, rendering them inapplicable in modern decentralized environments. Moreover, current federated learning approaches often suffer from poor generalization performance due to reliance on unreliable pseudo-labels, and fail to capture the latent correlations amongst heterogeneous clients. To tackle these limitations, this paper proposes a novel framework named Federated Multi-Task Clustering (i.e.,FMTC), which intends to learn personalized clustering models for heterogeneous clients while collaboratively leveraging their shared underlying structure in a privacy-preserving manner. More specifically, the FMTC framework is composed of two main components: client-side personalized clustering module, which learns a parameterized mapping model to support robust out-of-sample inference, bypassing the need for unreliable pseudo-labels; and server-side tensorial correlation module, which explicitly captures the shared knowledge across all clients. This is achieved by organizing all client models into a unified tensor and applying a low-rank regularization to discover their common subspace. To solve this joint optimization problem, we derive an efficient, privacy-preserving distributed algorithm based on the Alternating Direction Method of Multipliers, which decomposes the global problem into parallel local updates on clients and an aggregation step on the server. To the end, several extensive experiments on multiple real-world datasets demonstrate that our proposed FMTC framework significantly outperforms various baseline and state-of-the-art federated clustering algorithms.

CVOct 28, 2024
Novel Object Synthesis via Adaptive Text-Image Harmony

Zeren Xiong, Zedong Zhang, Zikun Chen et al.

In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image. However, most diffusion models struggle with this task, \textit{i.e.}, often generating an object that predominantly reflects either the text or the image due to an imbalance between their inputs. To address this issue, we propose a simple yet effective method called Adaptive Text-Image Harmony (ATIH) to generate novel and surprising objects. First, we introduce a scale factor and an injection step to balance text and image features in cross-attention and to preserve image information in self-attention during the text-image inversion diffusion process, respectively. Second, to better integrate object text and image, we design a balanced loss function with a noise parameter, ensuring both optimal editability and fidelity of the object image. Third, to adaptively adjust these parameters, we present a novel similarity score function that not only maximizes the similarities between the generated object image and the input text/image but also balances these similarities to harmonize text and image integration. Extensive experiments demonstrate the effectiveness of our approach, showcasing remarkable object creations such as colobus-glass jar. Project page: https://xzr52.github.io/ATIH/.

ROMar 1, 2024
Never-Ending Behavior-Cloning Agent for Robotic Manipulation

Wenqi Liang, Gan Sun, Qian He et al.

Relying on multi-modal observations, embodied robots could perform multiple robotic manipulation tasks in unstructured real-world environments. However, most language-conditioned behavior-cloning agents still face existing long-standing challenges, i.e., 3D scene representation and human-level task learning, when adapting into new sequential tasks in practical scenarios. We here investigate these above challenges with NBAgent in embodied robots, a pioneering language-conditioned Never-ending Behavior-cloning Agent. It can continually learn observation knowledge of novel 3D scene semantics and robot manipulation skills from skill-shared and skill-specific attributes, respectively. Specifically, we propose a skill-sharedsemantic rendering module and a skill-shared representation distillation module to effectively learn 3D scene semantics from skill-shared attribute, further tackling 3D scene representation overlooking. Meanwhile, we establish a skill-specific evolving planner to perform manipulation knowledge decoupling, which can continually embed novel skill-specific knowledge like human from latent and low-rank space. Finally, we design a never-ending embodied robot manipulation benchmark, and expensive experiments demonstrate the significant performance of our method. Visual results, code, and dataset are provided at: https://neragent.github.io.

CVApr 25, 2024
MuseumMaker: Continual Style Customization without Catastrophic Forgetting

Chenxi Liu, Gan Sun, Wenqi Liang et al.

Pre-trained large text-to-image (T2I) models with an appropriate text prompt has attracted growing interests in customized images generation field. However, catastrophic forgetting issue make it hard to continually synthesize new user-provided styles while retaining the satisfying results amongst learned styles. In this paper, we propose MuseumMaker, a method that enables the synthesis of images by following a set of customized styles in a never-end manner, and gradually accumulate these creative artistic works as a Museum. When facing with a new customization style, we develop a style distillation loss module to extract and learn the styles of the training data for new image generation. It can minimize the learning biases caused by content of new training images, and address the catastrophic overfitting issue induced by few-shot images. To deal with catastrophic forgetting amongst past learned styles, we devise a dual regularization for shared-LoRA module to optimize the direction of model update, which could regularize the diffusion model from both weight and feature aspects, respectively. Meanwhile, to further preserve historical knowledge from past styles and address the limited representability of LoRA, we consider a task-wise token learning module where a unique token embedding is learned to denote a new style. As any new user-provided style come, our MuseumMaker can capture the nuances of the new styles while maintaining the details of learned styles. Experimental results on diverse style datasets validate the effectiveness of our proposed MuseumMaker method, showcasing its robustness and versatility across various scenarios.

LGJan 21, 2025
GLAM: Global-Local Variation Awareness in Mamba-based World Model

Qian He, Wenqi Liang, Chunhui Hao et al.

Mimicking the real interaction trajectory in the inference of the world model has been shown to improve the sample efficiency of model-based reinforcement learning (MBRL) algorithms. Many methods directly use known state sequences for reasoning. However, this approach fails to enhance the quality of reasoning by capturing the subtle variation between states. Much like how humans infer trends in event development from this variation, in this work, we introduce Global-Local variation Awareness Mamba-based world model (GLAM) that improves reasoning quality by perceiving and predicting variation between states. GLAM comprises two Mambabased parallel reasoning modules, GMamba and LMamba, which focus on perceiving variation from global and local perspectives, respectively, during the reasoning process. GMamba focuses on identifying patterns of variation between states in the input sequence and leverages these patterns to enhance the prediction of future state variation. LMamba emphasizes reasoning about unknown information, such as rewards, termination signals, and visual representations, by perceiving variation in adjacent states. By integrating the strengths of the two modules, GLAM accounts for highervalue variation in environmental changes, providing the agent with more efficient imagination-based training. We demonstrate that our method outperforms existing methods in normalized human scores on the Atari 100k benchmark.

RODec 28, 2020
Generative Partial Visual-Tactile Fused Object Clustering

Tao Zhang, Yang Cong, Gan Sun et al.

Visual-tactile fused sensing for object clustering has achieved significant progresses recently, since the involvement of tactile modality can effectively improve clustering performance. However, the missing data (i.e., partial data) issues always happen due to occlusion and noises during the data collecting process. This issue is not well solved by most existing partial multi-view clustering methods for the heterogeneous modality challenge. Naively employing these methods would inevitably induce a negative effect and further hurt the performance. To solve the mentioned challenges, we propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering. More specifically, we first do partial visual and tactile features extraction from the partial visual and tactile data, respectively, and encode the extracted features in modality-specific feature subspaces. A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality, which can compensate missing samples and align the visual and tactile modalities naturally by adversarial learning. To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders. Extensive comparative experiments on three public visual-tactile datasets prove the effectiveness of our method.

CVDec 16, 2020
I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting

Jiahua Dong, Yang Cong, Gan Sun et al.

3D object classification has attracted appealing attentions in academic researches and industrial applications. However, most existing methods need to access the training data of past 3D object classes when facing the common real-world scenario: new classes of 3D objects arrive in a sequence. Moreover, the performance of advanced approaches degrades dramatically for past learned classes (i.e., catastrophic forgetting), due to the irregular and redundant geometric structures of 3D point cloud data. To address these challenges, we propose a new Incremental 3D Object Learning (i.e., I3DOL) model, which is the first exploration to learn new classes of 3D object continually. Specifically, an adaptive-geometric centroid module is designed to construct discriminative local geometric structures, which can better characterize the irregular point cloud representation for 3D object. Afterwards, to prevent the catastrophic forgetting brought by redundant geometric information, a geometric-aware attention mechanism is developed to quantify the contributions of local geometric structures, and explore unique 3D geometric characteristics with high contributions for classes incremental learning. Meanwhile, a score fairness compensation strategy is proposed to further alleviate the catastrophic forgetting caused by unbalanced data between past and new classes of 3D object, by compensating biased prediction for new classes in the validation phase. Experiments on 3D representative datasets validate the superiority of our I3DOL framework.

CVDec 8, 2020
Weakly-Supervised Cross-Domain Adaptation for Endoscopic Lesions Segmentation

Jiahua Dong, Yang Cong, Gan Sun et al.

Weakly-supervised learning has attracted growing research attention on medical lesions segmentation due to significant saving in pixel-level annotation cost. However, 1) most existing methods require effective prior and constraints to explore the intrinsic lesions characterization, which only generates incorrect and rough prediction; 2) they neglect the underlying semantic dependencies among weakly-labeled target enteroscopy diseases and fully-annotated source gastroscope lesions, while forcefully utilizing untransferable dependencies leads to the negative performance. To tackle above issues, we propose a new weakly-supervised lesions transfer framework, which can not only explore transferable domain-invariant knowledge across different datasets, but also prevent the negative transfer of untransferable representations. Specifically, a Wasserstein quantified transferability framework is developed to highlight widerange transferable contextual dependencies, while neglecting the irrelevant semantic characterizations. Moreover, a novel selfsupervised pseudo label generator is designed to equally provide confident pseudo pixel labels for both hard-to-transfer and easyto-transfer target samples. It inhibits the enormous deviation of false pseudo pixel labels under the self-supervision manner. Afterwards, dynamically-searched feature centroids are aligned to narrow category-wise distribution shift. Comprehensive theoretical analysis and experiments show the superiority of our model on the endoscopic dataset and several public datasets.

CVAug 24, 2020
CSCL: Critical Semantic-Consistent Learning for Unsupervised Domain Adaptation

Jiahua Dong, Yang Cong, Gan Sun et al.

Unsupervised domain adaptation without consuming annotation process for unlabeled target data attracts appealing interests in semantic segmentation. However, 1) existing methods neglect that not all semantic representations across domains are transferable, which cripples domain-wise transfer with untransferable knowledge; 2) they fail to narrow category-wise distribution shift due to category-agnostic feature alignment. To address above challenges, we develop a new Critical Semantic-Consistent Learning (CSCL) model, which mitigates the discrepancy of both domain-wise and category-wise distributions. Specifically, a critical transfer based adversarial framework is designed to highlight transferable domain-wise knowledge while neglecting untransferable knowledge. Transferability-critic guides transferability-quantizer to maximize positive transfer gain under reinforcement learning manner, although negative transfer of untransferable knowledge occurs. Meanwhile, with the help of confidence-guided pseudo labels generator of target samples, a symmetric soft divergence loss is presented to explore inter-class relationships and facilitate category-wise distribution alignment. Experiments on several datasets demonstrate the superiority of our model.

LGJun 27, 2020
Evolving Metric Learning for Incremental and Decremental Features

Jiahua Dong, Yang Cong, Gan Sun et al.

Online metric learning has been widely exploited for large-scale data classification due to the low computational cost. However, amongst online practical scenarios where the features are evolving (e.g., some features are vanished and some new features are augmented), most metric learning models cannot be successfully applied to these scenarios, although they can tackle the evolving instances efficiently. To address the challenge, we develop a new online Evolving Metric Learning (EML) model for incremental and decremental features, which can handle the instance and feature evolutions simultaneously by incorporating with a smoothed Wasserstein metric distance. Specifically, our model contains two essential stages: a Transforming stage (T-stage) and a Inheriting stage (I-stage). For the T-stage, we propose to extract important information from vanished features while neglecting non-informative knowledge, and forward it into survived features by transforming them into a low-rank discriminative metric space. It further explores the intrinsic low-rank structure of heterogeneous samples to reduce the computation and memory burden especially for highly-dimensional large-scale data. For the I-stage, we inherit the metric performance of survived features from the T-stage and then expand to include the new augmented features. Moreover, a smoothed Wasserstein distance is utilized to characterize the similarity relationships among the heterogeneous and complex samples, since the evolving features are not strictly aligned in the different stages. In addition to tackling the challenges in one-shot case, we also extend our model into multishot scenario. After deriving an efficient optimization strategy for both T-stage and I-stage, extensive experiments on several datasets verify the superior performance of our EML model.

CVApr 24, 2020
What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation

Jiahua Dong, Yang Cong, Gan Sun et al.

Unsupervised domain adaptation has attracted growing research attention on semantic segmentation. However, 1) most existing models cannot be directly applied into lesions transfer of medical images, due to the diverse appearances of same lesion among different datasets; 2) equal attention has been paid into all semantic representations instead of neglecting irrelevant knowledge, which leads to negative transfer of untransferable knowledge. To address these challenges, we develop a new unsupervised semantic transfer model including two complementary modules (i.e., T_D and T_F ) for endoscopic lesions segmentation, which can alternatively determine where and how to explore transferable domain-invariant knowledge between labeled source lesions dataset (e.g., gastroscope) and unlabeled target diseases dataset (e.g., enteroscopy). Specifically, T_D focuses on where to translate transferable visual information of medical lesions via residual transferability-aware bottleneck, while neglecting untransferable visual characterizations. Furthermore, T_F highlights how to augment transferable semantic features of various lesions and automatically ignore untransferable representations, which explores domain-invariant knowledge and in return improves the performance of T_D. To the end, theoretical analysis and extensive experiments on medical endoscopic dataset and several non-medical public datasets well demonstrate the superiority of our proposed model.

CRApr 19, 2020
Data Poisoning Attacks on Federated Machine Learning

Gan Sun, Yang Cong, Jiahua Dong et al.

Federated machine learning which enables resource constrained node devices (e.g., mobile phones and IoT devices) to learn a shared model while keeping the training data local, can provide privacy, security and economic benefits by designing an effective communication protocol. However, the communication protocol amongst different nodes could be exploited by attackers to launch data poisoning attacks, which has been demonstrated as a big threat to most machine learning models. In this paper, we attempt to explore the vulnerability of federated machine learning. More specifically, we focus on attacking a federated multi-task learning framework, which is a federated learning framework via adopting a general multi-task learning framework to handle statistical challenges. We formulate the problem of computing optimal poisoning attacks on federated multi-task learning as a bilevel program that is adaptive to arbitrary choice of target nodes and source attacking nodes. Then we propose a novel systems-aware optimization method, ATTack on Federated Learning (AT2FL), which is efficiency to derive the implicit gradients for poisoned data, and further compute optimal attack strategies in the federated machine learning. Our work is an earlier study that considers issues of data poisoning attack for federated learning. To the end, experimental results on real-world datasets show that federated multi-task learning model is very sensitive to poisoning attacks, when the attackers either directly poison the target nodes or indirectly poison the related nodes by exploiting the communication protocol.

CVDec 12, 2019
L3DOC: Lifelong 3D Object Classification

Yuyang Liu, Yang Cong, Gan Sun

3D object classification has been widely-applied into both academic and industrial scenarios. However, most state-of-the-art algorithms are facing with a fixed 3D object classification task set, which cannot well tackle the new coming data with incremental tasks as human ourselves. Meanwhile, the performance of most state-of-the-art lifelong learning models can be deteriorated easily on previously learned classification tasks, due to the existing of unordered, large-scale, and irregular 3D geometry data. To address this challenge, in this paper, we propose a Lifelong 3D Object Classification (i.e., L3DOC) framewor, which can consecutively learn new 3D object classification tasks via imitating 'human learning'. Specifically, the core idea of our proposed L3DOC model is to factorize PointNet in a perspective of lifelong learning, while capturing and storing the shared point-knowledge in a perspective of layer-wise tensor factorization architecture. To further transfer the task-specific knowledge from previous tasks to the new coming classification task, a memory attention mechanism is proposed to connect the current task with relevant previously tasks, which can effectively prevent catastrophic forgetting via soft-transferring previous knowledge. To our best knowledge, this is the first work about using lifelong learning to handle 3D object classification task without model fine-tuning or retraining. Furthermore, our L3DOC model can also be extended to other backbone network (e.g., PointNet++). To the end, comparisons on several point cloud datasets validate that our L3DOC model can reduce averaged 1.68~3.36 times parameters for the overall model, without sacrificing classification accuracy of each task.

LGNov 27, 2019
Lifelong Spectral Clustering

Gan Sun, Yang Cong, Qianqian Wang et al.

In the past decades, spectral clustering (SC) has become one of the most effective clustering algorithms. However, most previous studies focus on spectral clustering tasks with a fixed task set, which cannot incorporate with a new spectral clustering task without accessing to previously learned tasks. In this paper, we aim to explore the problem of spectral clustering in a lifelong machine learning framework, i.e., Lifelong Spectral Clustering (L2SC). Its goal is to efficiently learn a model for a new spectral clustering task by selectively transferring previously accumulated experience from knowledge library. Specifically, the knowledge library of L2SC contains two components: 1) orthogonal basis library: capturing latent cluster centers among the clusters in each pair of tasks; 2) feature embedding library: embedding the feature manifold information shared among multiple related tasks. As a new spectral clustering task arrives, L2SC firstly transfers knowledge from both basis library and feature library to obtain encoding matrix, and further redefines the library base over time to maximize performance across all the clustering tasks. Meanwhile, a general online update formulation is derived to alternatively update the basis library and feature library. Finally, the empirical experiments on several real-world benchmark datasets demonstrate that our L2SC model can effectively improve the clustering performance when comparing with other state-of-the-art spectral clustering algorithms.

LGNov 21, 2019
Visual Tactile Fusion Object Clustering

Tao Zhang, Yang Cong, Gan Sun et al.

Object clustering, aiming at grouping similar objects into one cluster with an unsupervised strategy, has been extensivelystudied among various data-driven applications. However, most existing state-of-the-art object clustering methods (e.g., single-view or multi-view clustering methods) only explore visual information, while ignoring one of most important sensing modalities, i.e., tactile information which can help capture different object properties and further boost the performance of object clustering task. To effectively benefit both visual and tactile modalities for object clustering, in this paper, we propose a deep Auto-Encoder-like Non-negative Matrix Factorization framework for visual-tactile fusion clustering. Specifically, deep matrix factorization constrained by an under-complete Auto-Encoder-like architecture is employed to jointly learn hierarchical expression of visual-tactile fusion data, and preserve the local structure of data generating distribution of visual and tactile modalities. Meanwhile, a graph regularizer is introduced to capture the intrinsic relations of data samples within each modality. Furthermore, we propose a modality-level consensus regularizer to effectively align thevisual and tactile data in a common subspace in which the gap between visual and tactile data is mitigated. For the model optimization, we present an efficient alternating minimization strategy to solve our proposed model. Finally, we conduct extensive experiments on public datasets to verify the effectiveness of our framework.

CVAug 21, 2019
Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation

Jiahua Dong, Yang Cong, Gan Sun et al.

Weakly-supervised learning under image-level labels supervision has been widely applied to semantic segmentation of medical lesions regions. However, 1) most existing models rely on effective constraints to explore the internal representation of lesions, which only produces inaccurate and coarse lesions regions; 2) they ignore the strong probabilistic dependencies between target lesions dataset (e.g., enteroscopy images) and well-to-annotated source diseases dataset (e.g., gastroscope images). To better utilize these dependencies, we present a new semantic lesions representation transfer model for weakly-supervised endoscopic lesions segmentation, which can exploit useful knowledge from relevant fully-labeled diseases segmentation task to enhance the performance of target weakly-labeled lesions segmentation task. More specifically, a pseudo label generator is proposed to leverage seed information to generate highly-confident pseudo pixel labels by incorporating class balance and super-pixel spatial prior. It can iteratively include more hard-to-transfer samples from weakly-labeled target dataset into training set. Afterwards, dynamically searched feature centroids for same class among different datasets are aligned by accumulating previously-learned features. Meanwhile, adversarial learning is also employed in this paper, to narrow the gap between the lesions among different datasets in output space. Finally, we build a new medical endoscopic dataset with 3659 images collected from more than 1100 volunteers. Extensive experiments on our collected dataset and several benchmark datasets validate the effectiveness of our model.

LGMar 6, 2019
Representative Task Self-selection for Flexible Clustered Lifelong Learning

Gan Sun, Yang Cong, Qianqian Wang et al.

Consider the lifelong machine learning paradigm whose objective is to learn a sequence of tasks depending on previous experiences, e.g., knowledge library or deep network weights. However, the knowledge libraries or deep networks for most recent lifelong learning models are with prescribed size, and can degenerate the performance for both learned tasks and coming ones when facing with a new task environment (cluster). To address this challenge, we propose a novel incremental clustered lifelong learning framework with two knowledge libraries: feature learning library and model knowledge library, called Flexible Clustered Lifelong Learning (FCL3). Specifically, the feature learning library modeled by an autoencoder architecture maintains a set of representation common across all the observed tasks, and the model knowledge library can be self-selected by identifying and adding new representative models (clusters). When a new task arrives, our proposed FCL3model firstly transfers knowledge from these libraries to encode the new task, i.e.,effectively and selectively soft-assigning this new task to multiple representative models over feature learning library. Then, 1) the new task with a higher outlier probability will be judged as a new representative, and used to redefine both feature learning library and representative models over time; or 2) the new task with lower outlier probability will only refine the feature learning library. For model optimization, we cast this lifelong learning problem as an alternating direction minimization problem as a new task comes. Finally, we evaluate the proposed framework by analyzing several multi-task datasets, and the experimental results demonstrate that our FCL3 model can achieve better performance than most lifelong learning frameworks, even batch clustered multi-task learning models.

LGMay 3, 2017
Lifelong Metric Learning

Gan Sun, Yang Cong, Ji Liu et al.

The state-of-the-art online learning approaches are only capable of learning the metric for predefined tasks. In this paper, we consider lifelong learning problem to mimic "human learning", i.e., endowing a new capability to the learned metric for a new task from new online samples and incorporating previous experiences and knowledge. Therefore, we propose a new metric learning framework: lifelong metric learning (LML), which only utilizes the data of the new task to train the metric model while preserving the original capabilities. More specifically, the proposed LML maintains a common subspace for all learned metrics, named lifelong dictionary, transfers knowledge from the common subspace to each new metric task with task-specific idiosyncrasy, and redefines the common subspace over time to maximize performance across all metric tasks. For model optimization, we apply online passive aggressive optimization algorithm to solve the proposed LML framework, where the lifelong dictionary and task-specific partition are optimized alternatively and consecutively. Finally, we evaluate our approach by analyzing several multi-task metric learning datasets. Extensive experimental results demonstrate effectiveness and efficiency of the proposed framework.