CVAug 17, 2021

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs

arXiv:2108.07884v135 citations
AI Analysis

This addresses the problem of understanding CNN representations for researchers in interpretability and robustness, though it is incremental in building on existing knowledge.

The paper challenges the assumption that global pooling removes spatial information in CNNs, showing that positional information is encoded channel-wise, and applies this to improve translation invariance and enable region-specific attacks.

In this paper, we challenge the common assumption that collapsing the spatial dimensions of a 3D (spatial-channel) tensor in a convolutional neural network (CNN) into a vector via global pooling removes all spatial information. Specifically, we demonstrate that positional information is encoded based on the ordering of the channel dimensions, while semantic information is largely not. Following this demonstration, we show the real world impact of these findings by applying them to two applications. First, we propose a simple yet effective data augmentation strategy and loss function which improves the translation invariance of a CNN's output. Second, we propose a method to efficiently determine which channels in the latent representation are responsible for (i) encoding overall position information or (ii) region-specific positions. We first show that semantic segmentation has a significant reliance on the overall position channels to make predictions. We then show for the first time that it is possible to perform a `region-specific' attack, and degrade a network's performance in a particular part of the input. We believe our findings and demonstrated applications will benefit research areas concerned with understanding the characteristics of CNNs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes