LG CR MEAug 28, 2024

Certified Causal Defense with Generalizable Robustness

arXiv:2408.15451v22.61 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This work addresses the issue of limited generalizability in certified adversarial defense for machine learning models, offering a solution that enhances robustness across domains, though it appears incremental by building on existing causal methods.

The paper tackles the problem of certified adversarial defense failing to generalize across data domains with distribution shifts, and proposes GLEAN, a framework that integrates causal factor learning to exclude spurious correlations, achieving validated superiority in certified robustness generalization on benchmark datasets.

While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g., $l_2$ ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials.

View on arXiv PDF

Similar