LGCRCYOct 21, 2022

The privacy issue of counterfactual explanations: explanation linkage attacks

arXiv:2210.12051v140 citationsh-index: 40
Originality Incremental advance
AI Analysis

This addresses privacy vulnerabilities for users of black-box machine learning models in high-stakes domains, representing an incremental improvement in securing XAI methods.

The paper tackles the privacy risks in Explainable AI by introducing explanation linkage attacks when using counterfactual explanations, and proposes k-anonymous counterfactual explanations with a new metric called pureness to mitigate these attacks, showing that this approach improves explanation quality.

Black-box machine learning models are being used in more and more high-stakes domains, which creates a growing need for Explainable AI (XAI). Unfortunately, the use of XAI in machine learning introduces new privacy risks, which currently remain largely unnoticed. We introduce the explanation linkage attack, which can occur when deploying instance-based strategies to find counterfactual explanations. To counter such an attack, we propose k-anonymous counterfactual explanations and introduce pureness as a new metric to evaluate the validity of these k-anonymous counterfactual explanations. Our results show that making the explanations, rather than the whole dataset, k- anonymous, is beneficial for the quality of the explanations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes