A Method for Restoring the Training Set Distribution in an Image Classifier
This addresses the issue of model interpretability and robustness for researchers and practitioners in machine learning, though it appears incremental as it builds on existing adversarial example research.
The paper tackles the problem of assessing the quality and robustness of convolutional neural networks in image classification by introducing a method to reconstruct samples from the training set distribution without deep knowledge of it, enabling analysis of influential image elements and training distribution insights.
Convolutional Neural Networks are a well-known staple of modern image classification. However, it can be difficult to assess the quality and robustness of such models. Deep models are known to perform well on a given training and estimation set, but can easily be fooled by data that is specifically generated for the purpose. It has been shown that one can produce an artificial example that does not represent the desired class, but activates the network in the desired way. This paper describes a new way of reconstructing a sample from the training set distribution of an image classifier without deep knowledge about the underlying distribution. This enables access to the elements of images that most influence the decision of a convolutional network and to extract meaningful information about the training distribution.