Every Filter Extracts A Specific Texture In Convolutional Neural Networks
This provides an intuitive explanation for how CNNs represent image style, which is incremental but clarifies a known bottleneck in deep visualization.
The paper tackled the problem of understanding what individual filters in convolutional neural networks (CNNs) extract from images, revealing that each filter extracts a specific texture, with higher layers capturing more colors and intricate structures, and demonstrated that image style can be represented as a combination of these textures.
Many works have concentrated on visualizing and understanding the inner mechanism of convolutional neural networks (CNNs) by generating images that activate some specific neurons, which is called deep visualization. However, it is still unclear what the filters extract from images intuitively. In this paper, we propose a modified code inversion algorithm, called feature map inversion, to understand the function of filter of interest in CNNs. We reveal that every filter extracts a specific texture. The texture from higher layer contains more colours and more intricate structures. We also demonstrate that style of images could be a combination of these texture primitives. Two methods are proposed to reallocate energy distribution of feature maps randomly and purposefully. Then, we inverse the modified code and generate images of diverse styles. With these results, we provide an explanation about why Gram matrix of feature maps \cite{Gatys_2016_CVPR} could represent image style.