Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data
This addresses the challenge of learning robust features from semi-supervised data for adversarial robustness in machine learning, representing an incremental improvement over existing methods.
The paper tackles the problem of incomplete perturbation in adversarial training, which leads to suboptimal robust feature learning, by proposing Weakly Supervised Contrastive Adversarial Training (WSCAT) to ensure complete perturbation and improve robustness, with experiments on benchmarks validating its superiority.
Existing adversarial training (AT) methods often suffer from incomplete perturbation, meaning that not all non-robust features are perturbed when generating adversarial examples (AEs). This results in residual correlations between non-robust features and labels, leading to suboptimal learning of robust features. However, achieving complete perturbation, i.e., perturbing as many non-robust features as possible, is challenging due to the difficulty in distinguishing robust and non-robust features and the sparsity of labeled data. To address these challenges, we propose a novel approach called Weakly Supervised Contrastive Adversarial Training (WSCAT). WSCAT ensures complete perturbation for improved learning of robust features by disrupting correlations between non-robust features and labels through complete AE generation over partially labeled data, grounded in information theory. Extensive theoretical analysis and comprehensive experiments on widely adopted benchmarks validate the superiority of WSCAT. Our code is available at https://github.com/zhang-lilin/WSCAT.