CV LGJan 14, 2020

The problems with using STNs to align CNN feature maps

Lukas Finnveden, Ylva Jansson, Tony Lindeberg

arXiv:2001.05858v11.2

Originality Synthesis-oriented

AI Analysis

This addresses a limitation in STNs for computer vision, but it is incremental as it builds on existing methods.

The paper identifies that spatial transformer networks (STNs) cannot align feature maps between transformed and original images, leading to decreased classification accuracy, and proposes parameter sharing as a solution.

Spatial transformer networks (STNs) were designed to enable CNNs to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image and its original. We present a theoretical argument for this and investigate the practical implications, showing that this inability is coupled with decreased classification accuracy. We advocate taking advantage of more complex features in deeper layers by instead sharing parameters between the classification and the localisation network.

View on arXiv PDF

Similar