CVLGIVMar 16, 2020

On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location

arXiv:2003.07064v2273 citations
AI Analysis

This addresses a fundamental limitation in CNN design, particularly for applications like image classification and video analysis, though it is incremental in refining existing architectures.

The paper challenges the assumption that convolutional layers in CNNs are translation invariant, showing they exploit absolute spatial location via boundary effects, and proposes a solution to remove this encoding, improving translation invariance and benefiting small datasets.

In this paper we challenge the common assumption that convolutional layers in modern CNNs are translation invariant. We show that CNNs can and will exploit the absolute spatial location by learning filters that respond exclusively to particular absolute locations by exploiting image boundary effects. Because modern CNNs filters have a huge receptive field, these boundary effects operate even far from the image boundary, allowing the network to exploit absolute spatial location all over the image. We give a simple solution to remove spatial location encoding which improves translation invariance and thus gives a stronger visual inductive bias which particularly benefits small data sets. We broadly demonstrate these benefits on several architectures and various applications such as image classification, patch matching, and two video classification datasets.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes