LGJan 26, 2023

Certified Interpretability Robustness for Class Activation Mapping

Alex Gu, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel

arXiv:2301.11324v13.82 citationsh-index: 59

Originality Incremental advance

AI Analysis

This addresses the safety-critical need for robust interpretability in autonomous driving systems, though it appears incremental as it builds on existing CAM methods.

The paper tackles the problem of providing certified robustness guarantees for interpretability methods like Class Activation Mapping (CAM), presenting CORGI which computes certifiable lower bounds for the robustness of top-k pixels in CAM maps. They demonstrate effectiveness on traffic sign data with certified bounds within 4-5x of state-of-the-art attack methods.

Interpreting machine learning models is challenging but crucial for ensuring the safety of deep networks in autonomous driving systems. Due to the prevalence of deep learning based perception models in autonomous vehicles, accurately interpreting their predictions is crucial. While a variety of such methods have been proposed, most are shown to lack robustness. Yet, little has been done to provide certificates for interpretability robustness. Taking a step in this direction, we present CORGI, short for Certifiably prOvable Robustness Guarantees for Interpretability mapping. CORGI is an algorithm that takes in an input image and gives a certifiable lower bound for the robustness of the top k pixels of its CAM interpretability map. We show the effectiveness of CORGI via a case study on traffic sign data, certifying lower bounds on the minimum adversarial perturbation not far from (4-5x) state-of-the-art attack methods.

View on arXiv PDF

Similar