CV LGMay 4, 2020

On the Benefits of Models with Perceptually-Aligned Gradients

Gunjan Aggarwal, Abhishek Sinha, Nupur Kumari, Mayank Singh

arXiv:2005.01499v110.616 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of enhancing interpretability and task performance in machine learning models for researchers and practitioners, though it is incremental as it builds on existing adversarial training methods.

The paper investigates models with perceptually-aligned gradients, showing that adversarial training with low perturbation bounds yields interpretable features without significant performance drops on clean data, and improves zero-shot and weakly supervised localization tasks.

Adversarial robust models have been shown to learn more robust and interpretable features than standard trained models. As shown in [\cite{tsipras2018robustness}], such robust models inherit useful interpretable properties where the gradient aligns perceptually well with images, and adding a large targeted adversarial perturbation leads to an image resembling the target class. We perform experiments to show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks. Specifically, we perform adversarial training with attack for different max-perturbation bound. Adversarial training with low max-perturbation bound results in models that have interpretable features with only slight drop in performance over clean samples. In this paper, we leverage models with interpretable perceptually-aligned features and show that adversarial training with low max-perturbation bound can improve the performance of models for zero-shot and weakly supervised localization tasks.

View on arXiv PDF

Similar