CRLGJun 5, 2024

Graph Neural Network Explanations are Fragile

arXiv:2406.03193v122 citations
Originality Highly original
AI Analysis

This work highlights a critical security flaw in explainable AI for graph-based systems, potentially undermining trust in GNN applications, and is incremental as it builds on existing GNN explainer research.

The paper investigates the vulnerability of Graph Neural Network (GNN) explainers to adversarial attacks, finding that slight perturbations to graph structure can drastically alter explanations while maintaining correct model predictions, with methods achieving high attack success rates across various explainers.

Explainable Graph Neural Network (GNN) has emerged recently to foster the trust of using GNNs. Existing GNN explainers are developed from various perspectives to enhance the explanation performance. We take the first step to study GNN explainers under adversarial attack--We found that an adversary slightly perturbing graph structure can ensure GNN model makes correct predictions, but the GNN explainer yields a drastically different explanation on the perturbed graph. Specifically, we first formulate the attack problem under a practical threat model (i.e., the adversary has limited knowledge about the GNN explainer and a restricted perturbation budget). We then design two methods (i.e., one is loss-based and the other is deduction-based) to realize the attack. We evaluate our attacks on various GNN explainers and the results show these explainers are fragile.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes