LG MLJul 4, 2021

Certifiably Robust Interpretation via Renyi Differential Privacy

Ao Liu, Xiaoyu Chen, Sijia Liu, Lirong Xia, Chuang Gan

arXiv:2107.01561v19.214 citations

Originality Highly original

AI Analysis

This addresses the problem of unreliable interpretability in machine learning for users who rely on model explanations, offering a certifiable solution with incremental improvements over existing methods.

The paper tackles the vulnerability of CNN interpretation maps to adversarial attacks by proposing a method based on Renyi differential privacy that provides provable top-k robustness against input perturbations, achieving about 10% better experimental robustness and twice the robustness under computational constraints while also improving accuracy.

Motivated by the recent discovery that the interpretation maps of CNNs could easily be manipulated by adversarial attacks against network interpretability, we study the problem of interpretation robustness from a new perspective of \Renyi differential privacy (RDP). The advantages of our Renyi-Robust-Smooth (RDP-based interpretation method) are three-folds. First, it can offer provable and certifiable top-$k$ robustness. That is, the top-$k$ important attributions of the interpretation map are provably robust under any input perturbation with bounded $\ell_d$-norm (for any $d\geq 1$, including $d = \infty$). Second, our proposed method offers $\sim10\%$ better experimental robustness than existing approaches in terms of the top-$k$ attributions. Remarkably, the accuracy of Renyi-Robust-Smooth also outperforms existing approaches. Third, our method can provide a smooth tradeoff between robustness and computational efficiency. Experimentally, its top-$k$ attributions are {\em twice} more robust than existing approaches when the computational resources are highly constrained.

View on arXiv PDF

Similar