An interpretation of the final fully connected layer
This work addresses the problem of interpretability in neural networks for researchers and practitioners, but it is incremental as it builds on existing methods without introducing a new paradigm.
The paper tackles the difficulty of interpreting neural network outputs by proposing a method to understand the weights in the final fully connected layer of image classification models, using a connection between policy gradient in RL and supervised learning to identify discriminative and confusing image parts, with results reported on pre-trained models.
In recent years neural networks have achieved state-of-the-art accuracy for various tasks but the the interpretation of the generated outputs still remains difficult. In this work we attempt to provide a method to understand the learnt weights in the final fully connected layer in image classification models. We motivate our method by drawing a connection between the policy gradient objective in RL and supervised learning objective. We suggest that the commonly used cross entropy based supervised learning objective can be regarded as a special case of the policy gradient objective. Using this insight we propose a method to find the most discriminative and confusing parts of an image. Our method does not make any prior assumption about neural network achitecture and has low computational cost. We apply our method on publicly available pre-trained models and report the generated results.