Deep saliency: What is learnt by a deep network about saliency?
This addresses the interpretability problem for researchers in computer vision, though it is incremental as it builds on existing CNN methods.
The paper investigates what deep convolutional neural networks learn about saliency detection, showing that fine-tuning a pre-trained network transforms deeper layers to develop receptive fields similar to early center-surround filters.
Deep convolutional neural networks have achieved impressive performance on a broad range of problems, beating prior art on established benchmarks, but it often remains unclear what are the representations learnt by those systems and how they achieve such performance. This article examines the specific problem of saliency detection, where benchmarks are currently dominated by CNN-based approaches, and investigates the properties of the learnt representation by visualizing the artificial neurons' receptive fields. We demonstrate that fine tuning a pre-trained network on the saliency detection task lead to a profound transformation of the network's deeper layers. Moreover we argue that this transformation leads to the emergence of receptive fields conceptually similar to the centre-surround filters hypothesized by early research on visual saliency.