LG GRMar 16, 2025

Algebraic Adversarial Attacks on Explainability Models

Lachlan Simpson, Federico Costanza, Kyle Millar, Adriel Cheng, Cheng-Chew Lim, Hong Gunn Chew

arXiv:2503.12683v11 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses the need for mathematically tractable adversarial attacks in explainability, but appears incremental as it adapts existing geometric deep learning concepts to a new context.

The paper tackled the problem of generating adversarial examples for explainability models by proposing an algebraic approach based on symmetry groups of neural networks, and validated it on two well-known and one real-world dataset.

Classical adversarial attacks are phrased as a constrained optimisation problem. Despite the efficacy of a constrained optimisation approach to adversarial attacks, one cannot trace how an adversarial point was generated. In this work, we propose an algebraic approach to adversarial attacks and study the conditions under which one can generate adversarial examples for post-hoc explainability models. Phrasing neural networks in the framework of geometric deep learning, algebraic adversarial attacks are constructed through analysis of the symmetry groups of neural networks. Algebraic adversarial examples provide a mathematically tractable approach to adversarial examples. We validate our approach of algebraic adversarial examples on two well-known and one real-world dataset.

View on arXiv PDF

Similar