CVAug 11, 2022
Memorizing Complementation Network for Few-Shot Class-Incremental LearningZhong Ji, Zhishen Hou, Xiyao Liu et al.
Few-shot Class-Incremental Learning (FSCIL) aims at learning new concepts continually with only a few samples, which is prone to suffer the catastrophic forgetting and overfitting problems. The inaccessibility of old classes and the scarcity of the novel samples make it formidable to realize the trade-off between retaining old knowledge and learning novel concepts. Inspired by that different models memorize different knowledge when learning novel concepts, we propose a Memorizing Complementation Network (MCNet) to ensemble multiple models that complements the different memorized knowledge with each other in novel tasks. Additionally, to update the model with few novel samples, we develop a Prototype Smoothing Hard-mining Triplet (PSHT) loss to push the novel samples away from not only each other in current task but also the old distribution. Extensive experiments on three benchmark datasets, e.g., CIFAR100, miniImageNet and CUB200, have demonstrated the superiority of our proposed method.
CRNov 14, 2022
Watermarking in Secure Federated Learning: A Verification Framework Based on Client-Side BackdooringWenyuan Yang, Shuo Shao, Yue Yang et al.
Federated learning (FL) allows multiple participants to collaboratively build deep learning (DL) models without directly sharing data. Consequently, the issue of copyright protection in FL becomes important since unreliable participants may gain access to the jointly trained model. Application of homomorphic encryption (HE) in secure FL framework prevents the central server from accessing plaintext models. Thus, it is no longer feasible to embed the watermark at the central server using existing watermarking schemes. In this paper, we propose a novel client-side FL watermarking scheme to tackle the copyright protection issue in secure FL with HE. To our best knowledge, it is the first scheme to embed the watermark to models under the Secure FL environment. We design a black-box watermarking scheme based on client-side backdooring to embed a pre-designed trigger set into an FL model by a gradient-enhanced embedding method. Additionally, we propose a trigger set construction mechanism to ensure the watermark cannot be forged. Experimental results demonstrate that our proposed scheme delivers outstanding protection performance and robustness against various watermark removal attacks and ambiguity attack.
SYApr 1
Data-driven Moving Horizon Estimation for Angular Velocity of Space Noncooperative Target in Eddy Current De-tumbling MissionXiyao Liu, Haitao Chang, Fei Hui et al.
Angular velocity estimation is critical for eddy current de-tumbling of noncooperative space targets. However, unknown model of the noncooperative target and few observation data make the model-based estimation methods challenged. In this paper, a Data-driven Moving Horizon Estimation method is proposed to estimate the angular velocity of the noncooperative target with de-tumbling torque. In this method, model-free state estimation of the angular velocity can be achieved using only one historical trajectory data that satisfies the rank condition. With local linear approximation, the Willems fundamental lemma is extended to nonlinear autonomous systems, and the rank condition for the historical trajectory data is deduced. Then, a data-driven moving horizon estimation algorithm based on the M step Lyapunov function is designed, and the time-discount robust stability of the algorithm is given. In order to illustrate the effectiveness of the proposed algorithm, experiments and simulations are performed to estimate the angular velocity in eddy current de-tumbling with only de-tumbling torque measurement.
AIOct 18, 2025Code
Humanoid-inspired Causal Representation Learning for Domain GeneralizationZe Tao, Jian Zhang, Haowei Li et al.
This paper proposes the Humanoid-inspired Structural Causal Model (HSCM), a novel causal framework inspired by human intelligence, designed to overcome the limitations of conventional domain generalization models. Unlike approaches that rely on statistics to capture data-label dependencies and learn distortion-invariant representations, HSCM replicates the hierarchical processing and multi-level learning of human vision systems, focusing on modeling fine-grained causal mechanisms. By disentangling and reweighting key image attributes such as color, texture, and shape, HSCM enhances generalization across diverse domains, ensuring robust performance and interpretability. Leveraging the flexibility and adaptability of human intelligence, our approach enables more effective transfer and learning in dynamic, complex environments. Through both theoretical and empirical evaluations, we demonstrate that HSCM outperforms existing domain generalization models, providing a more principled method for capturing causal relationships and improving model robustness. The code is available at https://github.com/lambett/HSCM.
CVApr 17, 2025
Vision and Language Integration for Domain GeneralizationYanmei Wang, Xiyao Liu, Fupeng Chu et al.
Domain generalization aims at training on source domains to uncover a domain-invariant feature space, allowing the model to perform robust generalization ability on unknown target domains. However, due to domain gaps, it is hard to find reliable common image feature space, and the reason for that is the lack of suitable basic units for images. Different from image in vision space, language has comprehensive expression elements that can effectively convey semantics. Inspired by the semantic completeness of language and intuitiveness of image, we propose VLCA, which combine language space and vision space, and connect the multiple image domains by using semantic space as the bridge domain. Specifically, in language space, by taking advantage of the completeness of language basic units, we tend to capture the semantic representation of the relations between categories through word vector distance. Then, in vision space, by taking advantage of the intuitiveness of image features, the common pattern of sample features with the same class is explored through low-rank approximation. In the end, the language representation is aligned with the vision representation through the multimodal space of text and image. Experiments demonstrate the effectiveness of the proposed method.
CVJan 18, 2022
MuSCLe: A Multi-Strategy Contrastive Learning Framework for Weakly Supervised Semantic SegmentationKunhao Yuan, Gerald Schaefer, Yu-Kun Lai et al.
Weakly supervised semantic segmentation (WSSS) has gained significant popularity since it relies only on weak labels such as image level annotations rather than pixel level annotations required by supervised semantic segmentation (SSS) methods. Despite drastically reduced annotation costs, typical feature representations learned from WSSS are only representative of some salient parts of objects and less reliable compared to SSS due to the weak guidance during training. In this paper, we propose a novel Multi-Strategy Contrastive Learning (MuSCLe) framework to obtain enhanced feature representations and improve WSSS performance by exploiting similarity and dissimilarity of contrastive sample pairs at image, region, pixel and object boundary levels. Extensive experiments demonstrate the effectiveness of our method and show that MuSCLe outperforms the current state-of-the-art on the widely used PASCAL VOC 2012 dataset.
CVSep 3, 2021
Self-Taught Cross-Domain Few-Shot Learning with Weakly Supervised Object Localization and Task-DecompositionXiyao Liu, Zhong Ji, Yanwei Pang et al.
The domain shift between the source and target domain is the main challenge in Cross-Domain Few-Shot Learning (CD-FSL). However, the target domain is absolutely unknown during the training on the source domain, which results in lacking directed guidance for target tasks. We observe that since there are similar backgrounds in target domains, it can apply self-labeled samples as prior tasks to transfer knowledge onto target tasks. To this end, we propose a task-expansion-decomposition framework for CD-FSL, called Self-Taught (ST) approach, which alleviates the problem of non-target guidance by constructing task-oriented metric spaces. Specifically, Weakly Supervised Object Localization (WSOL) and self-supervised technologies are employed to enrich task-oriented samples by exchanging and rotating the discriminative regions, which generates a more abundant task set. Then these tasks are decomposed into several tasks to finish the task of few-shot recognition and rotation classification. It helps to transfer the source knowledge onto the target tasks and focus on discriminative regions. We conduct extensive experiments under the cross-domain setting including 8 target domains: CUB, Cars, Places, Plantae, CropDieases, EuroSAT, ISIC, and ChestX. Experimental results demonstrate that the proposed ST approach is applicable to various metric-based models, and provides promising improvements in CD-FSL.
CVSep 3, 2021
Information Symmetry Matters: A Modal-Alternating Propagation Network for Few-Shot LearningZhong Ji, Zhishen Hou, Xiyao Liu et al.
Semantic information provides intra-class consistency and inter-class discriminability beyond visual concepts, which has been employed in Few-Shot Learning (FSL) to achieve further gains. However, semantic information is only available for labeled samples but absent for unlabeled samples, in which the embeddings are rectified unilaterally by guiding the few labeled samples with semantics. Therefore, it is inevitable to bring a cross-modal bias between semantic-guided samples and nonsemantic-guided samples, which results in an information asymmetry problem. To address this problem, we propose a Modal-Alternating Propagation Network (MAP-Net) to supplement the absent semantic information of unlabeled samples, which builds information symmetry among all samples in both visual and semantic modalities. Specifically, the MAP-Net transfers the neighbor information by the graph propagation to generate the pseudo-semantics for unlabeled samples guided by the completed visual relationships and rectify the feature embeddings. In addition, due to the large discrepancy between visual and semantic modalities, we design a Relation Guidance (RG) strategy to guide the visual relation vectors via semantics so that the propagated information is more beneficial. Extensive experimental results on three semantic-labeled datasets, i.e., Caltech-UCSD-Birds 200-2011, SUN Attribute Database, and Oxford 102 Flower, have demonstrated that our proposed method achieves promising performance and outperforms the state-of-the-art approaches, which indicates the necessity of information symmetry.
MMDec 27, 2017
Robust and discriminative zero-watermark scheme based on invariant feature and similarity-based retrieval for protecting large-scale DIBR 3D videosXiyao Liu, Yifang Wang, Ziqiang Sun et al.
Digital rights management (DRM) of depth-image-based rendering (DIBR) 3D video is an emerging area of research. Existing schemes for DIBR 3D video cause video distortions, are vulnerable to severe signal and geometric attacks, cannot protect 2D frame and depth map independently or can hardly deal with large-scale videos. To address these issues, a novel zero-watermark scheme based on invariant feature and similarity-based retrieval for protecting DIBR 3D video (RZW-SR3D) is proposed in this study. In RZW-SR3D, invariant features are extracted to generate master and ownership shares for providing distortion-free, robust and discriminative copyright identification under various attacks. Different from traditional zero-watermark schemes, features and ownership shares are stored correlatively, and a similarity-based retrieval phase is designed to provide effective solutions for large-scale videos. In addition, flexible mechanisms based on attention-based fusion are designed to protect 2D frame and depth map independently and simultaneously. Experimental results demonstrate that RZW-SR3D have superior DRM performances than existing schemes. First, RZW-SR3D can extracted the ownership shares relevant to a particular 3D video precisely and reliably for effective copyright identification of large-scale videos. Second, RZW-SR3D ensures lossless, precise, reliable and flexible copyright identification for 2D frame and depth map of 3D videos.