LGDec 21, 2023

Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

Ruichu Cai, Yuxuan Zhu, Jie Qiao, Zefeng Liang, Furui Liu, Zhifeng Hao

arXiv:2312.13628v26.67 citationsh-index: 31Has CodeAAAI

Originality Incremental advance

AI Analysis

This work addresses the vulnerability of deep neural networks to adversarial attacks by introducing a causality-inspired method, offering a more practical approach for security applications, though it is incremental as it builds on existing adversarial example generation techniques.

The paper tackles the problem of generating realistic adversarial examples by incorporating causal relationships between features, proposing CADE to answer where and how to attack, with empirical results showing competitive performance across various attack scenarios.

Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer \emph{where to attack}. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate \textbf{C}ounterfactual \textbf{AD}versarial \textbf{E}xamples to answer \emph{how to attack}. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.

View on arXiv PDF Code

Similar