AILGDec 2, 2021

A Unified Framework for Adversarial Attack and Defense in Constrained Feature Space

arXiv:2112.01156v232 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of assessing model robustness in domains with constraints, such as non-vision applications, though it is incremental by building on existing attack and defense concepts.

The authors tackled the problem of generating feasible adversarial examples in constrained feature spaces, achieving up to 100% success rate where prior methods failed, and proposed a defense as effective as adversarial retraining.

The generation of feasible adversarial examples is necessary for properly assessing models that work in constrained feature space. However, it remains a challenging task to enforce constraints into attacks that were designed for computer vision. We propose a unified framework to generate feasible adversarial examples that satisfy given domain constraints. Our framework can handle both linear and non-linear constraints. We instantiate our framework into two algorithms: a gradient-based attack that introduces constraints in the loss function to maximize, and a multi-objective search algorithm that aims for misclassification, perturbation minimization, and constraint satisfaction. We show that our approach is effective in four different domains, with a success rate of up to 100%, where state-of-the-art attacks fail to generate a single feasible example. In addition to adversarial retraining, we propose to introduce engineered non-convex constraints to improve model adversarial robustness. We demonstrate that this new defense is as effective as adversarial retraining. Our framework forms the starting point for research on constrained adversarial attacks and provides relevant baselines and datasets that future research can exploit.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes