LGMar 15, 2021

Understanding Invariance via Feedforward Inversion of Discriminatively Trained Classifiers

Piotr Teterwak, Chiyuan Zhang, Dilip Krishnan, Michael C. Mozer

arXiv:2103.07470v29.911 citations

Originality Incremental advance

AI Analysis

This work provides a tool for researchers to analyze and improve discriminative models by exploring information flow and invariance in neural networks, though it is incremental in building on existing inversion techniques.

The paper tackled the problem of understanding what information remains in the logit vectors of discriminatively trained neural net classifiers by developing a feedforward inversion model that produces high-fidelity reconstructions, revealing that extraneous visual details persist and can be used to explore representations and invariance.

A discriminatively trained neural net classifier can fit the training data perfectly if all information about its input other than class membership has been discarded prior to the output layer. Surprisingly, past research has discovered that some extraneous visual detail remains in the logit vector. This finding is based on inversion techniques that map deep embeddings back to images. We explore this phenomenon further using a novel synthesis of methods, yielding a feedforward inversion model that produces remarkably high fidelity reconstructions, qualitatively superior to those of past efforts. When applied to an adversarially robust classifier model, the reconstructions contain sufficient local detail and global structure that they might be confused with the original image in a quick glance, and the object category can clearly be gleaned from the reconstruction. Our approach is based on BigGAN (Brock, 2019), with conditioning on logits instead of one-hot class labels. We use our reconstruction model as a tool for exploring the nature of representations, including: the influence of model architecture and training objectives (specifically robust losses), the forms of invariance that networks achieve, representational differences between correctly and incorrectly classified images, and the effects of manipulating logits and images. We believe that our method can inspire future investigations into the nature of information flow in a neural net and can provide diagnostics for improving discriminative models.

View on arXiv PDF

Similar