Increasing Interpretability of Neural Networks By Approximating Human Visual Saliency
This work addresses the challenge of making neural networks more interpretable for users in computer vision, but it is incremental as it builds on existing saliency methods.
The paper tackles the problem of costly human annotation for improving neural network interpretability by combining saliency incorporation with active learning, reducing required human annotations by 80% while maintaining interpretability gains of up to 30%.
Understanding specifically where a model focuses on within an image is critical for human interpretability of the decision-making process. Deep learning-based solutions are prone to learning coincidental correlations in training datasets, causing over-fitting and reducing the explainability. Recent advances have shown that guiding models to human-defined regions of saliency within individual images significantly increases performance and interpretability. Human-guided models also exhibit greater generalization capabilities, as coincidental dataset features are avoided. Results show that models trained with saliency incorporation display an increase in interpretability of up to 30% over models trained without saliency information. The collection of this saliency information, however, can be costly, laborious and in some cases infeasible. To address this limitation, we propose a combination strategy of saliency incorporation and active learning to reduce the human annotation data required by 80% while maintaining the interpretability and performance increase from human saliency. Extensive experimentation outlines the effectiveness of the proposed approach across five public datasets and six active learning criteria.