LGMay 30, 2022

Fool SHAP with Stealthily Biased Sampling

Gabriel Laberge, Ulrich Aïvodji, Satoshi Hara, Mario Marchand., Foutse Khomh

arXiv:2205.15419v36.95 citationsh-index: 48Has Code

Originality Incremental advance

AI Analysis

This work exposes a vulnerability in SHAP explanations for fairness auditing, enabling adversaries to hide biases, which is an incremental but practical security concern.

The paper tackles the problem of manipulating SHAP explanations without altering the model by using stealthily biased sampling of background data, showing in experiments on real-world datasets that this attack can reduce the importance of a sensitive feature by up to 90% in fairness audits.

SHAP explanations aim at identifying which features contribute the most to the difference in model prediction at a specific input versus a background distribution. Recent studies have shown that they can be manipulated by malicious adversaries to produce arbitrary desired explanations. However, existing attacks focus solely on altering the black-box model itself. In this paper, we propose a complementary family of attacks that leave the model intact and manipulate SHAP explanations using stealthily biased sampling of the data points used to approximate expectations w.r.t the background distribution. In the context of fairness audit, we show that our attack can reduce the importance of a sensitive feature when explaining the difference in outcomes between groups while remaining undetected. More precisely, experiments performed on real-world datasets showed that our attack could yield up to a 90\% relative decrease in amplitude of the sensitive feature attribution. These results highlight the manipulability of SHAP explanations and encourage auditors to treat them with skepticism.

View on arXiv PDF Code

Similar