CVLGMay 4, 2020

On the Benefits of Models with Perceptually-Aligned Gradients

arXiv:2005.01499v116 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing interpretability and task performance in machine learning models for researchers and practitioners, though it is incremental as it builds on existing adversarial training methods.

The paper investigates models with perceptually-aligned gradients, showing that adversarial training with low perturbation bounds yields interpretable features without significant performance drops on clean data, and improves zero-shot and weakly supervised localization tasks.

Adversarial robust models have been shown to learn more robust and interpretable features than standard trained models. As shown in [\cite{tsipras2018robustness}], such robust models inherit useful interpretable properties where the gradient aligns perceptually well with images, and adding a large targeted adversarial perturbation leads to an image resembling the target class. We perform experiments to show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks. Specifically, we perform adversarial training with attack for different max-perturbation bound. Adversarial training with low max-perturbation bound results in models that have interpretable features with only slight drop in performance over clean samples. In this paper, we leverage models with interpretable perceptually-aligned features and show that adversarial training with low max-perturbation bound can improve the performance of models for zero-shot and weakly supervised localization tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes