LG CRMay 15, 2024

Cross-Input Certified Training for Universal Perturbations

arXiv:2405.09176v24.62 citationsh-index: 5ECCV

Originality Highly original

AI Analysis

This addresses a practical security problem for machine learning systems vulnerable to universal adversarial attacks, representing a novel extension beyond single-input robustness.

The paper tackles the problem of training neural networks robust against universal adversarial perturbations (UAPs), which are input-agnostic attacks more feasible in real-world scenarios than single-input perturbations. The proposed method CITRUS outperforms traditional certified training methods by up to 10.3% in standard accuracy and achieves state-of-the-art performance on certified UAP accuracy.

Existing work in trustworthy machine learning primarily focuses on single-input adversarial perturbations. In many real-world attack scenarios, input-agnostic adversarial attacks, e.g. universal adversarial perturbations (UAPs), are much more feasible. Current certified training methods train models robust to single-input perturbations but achieve suboptimal clean and UAP accuracy, thereby limiting their applicability in practical applications. We propose a novel method, CITRUS, for certified training of networks robust against UAP attackers. We show in an extensive evaluation across different datasets, architectures, and perturbation magnitudes that our method outperforms traditional certified training methods on standard accuracy (up to 10.3\%) and achieves SOTA performance on the more practical certified UAP accuracy metric.

View on arXiv PDF

Similar