LGMLMay 26, 2021

Fooling Partial Dependence via Data Poisoning

arXiv:2105.12837v329 citations
Originality Incremental advance
AI Analysis

This work highlights a security flaw in model explainability tools, which is incremental as it builds on known robustness issues but introduces new attack methods.

The paper tackles the vulnerability of Partial Dependence (PD) explanations in predictive models by demonstrating that they can be manipulated through data poisoning, using genetic and gradient algorithms to bend explanations in desired directions, which poses risks in critical domains like finance and medicine.

Many methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability. It turns out that such explanations are not robust nor trustworthy, and they can be fooled. This paper presents techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. We believe this to be the first work using a genetic algorithm for manipulating explanations, which is transferable as it generalizes both ways: in a model-agnostic and an explanation-agnostic manner.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes