LGMLJul 21, 2020

Inverting the Feature Visualization Process for Feedforward Neural Networks

arXiv:2007.10757v1
AI Analysis

This work provides an alternative view on network sensitivity for researchers in interpretability, but it is incremental as it builds on existing feature visualization techniques.

The paper tackles the problem of invertibility in feature visualization for neural networks, finding that inputs generated by activation maximization often do not match their intended feature objectives, and proposes a method to compute the optimal feature objective via a closed-form solution based on gradient minimization.

This work sheds light on the invertibility of feature visualization in neural networks. Since the input that is generated by feature visualization using activation maximization does, in general, not yield the feature objective it was optimized for, we investigate optimizing for the feature objective that yields this input. Given the objective function used in activation maximization that measures how closely a given input resembles the feature objective, we exploit that the gradient of this function w.r.t. inputs is---up to a scaling factor---linear in the objective. This observation is used to find the optimal feature objective via computing a closed form solution that minimizes the gradient. By means of Inverse Feature Visualization, we intend to provide an alternative view on a networks sensitivity to certain inputs that considers feature objectives rather than activations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes