LGCRMLJan 28, 2019

Characterizing the Shape of Activation Space in Deep Neural Networks

arXiv:1901.09496v210 citations
AI Analysis

This work provides a topological perspective for understanding neural network activations, offering an alternative explanation for adversarial examples, which is incremental but domain-specific to interpretability in deep learning.

The authors tackled the problem of interpreting deep neural network representations by introducing a method to compute persistent homology over activation structures, revealing that adversarial examples alter dominant activation patterns rather than targeting semantic structures of adversarial classes.

The representations learned by deep neural networks are difficult to interpret in part due to their large parameter space and the complexities introduced by their multi-layer structure. We introduce a method for computing persistent homology over the graphical activation structure of neural networks, which provides access to the task-relevant substructures activated throughout the network for a given input. This topological perspective provides unique insights into the distributed representations encoded by neural networks in terms of the shape of their activation structures. We demonstrate the value of this approach by showing an alternative explanation for the existence of adversarial examples. By studying the topology of network activations across multiple architectures and datasets, we find that adversarial perturbations do not add activations that target the semantic structure of the adversarial class as previously hypothesized. Rather, adversarial examples are explainable as alterations to the dominant activation structures induced by the original image, suggesting the class representations learned by deep networks are problematically sparse on the input space.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes