CVSep 17, 2015

Recurrent Spatial Transformer Networks

arXiv:1509.05329v151 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving accuracy in sequential image classification for computer vision applications, but it is incremental as it builds on existing spatial transformer and recurrent network components.

The paper tackled digit classification in cluttered MNIST sequences by integrating spatial transformer networks into recurrent neural networks, achieving a single digit error of 1.5%, which is lower than convolutional networks (2.9%) and convolutional networks with SPN layers (2.0%).

We integrate the recently proposed spatial transformer network (SPN) [Jaderberg et. al 2015] into a recurrent neural network (RNN) to form an RNN-SPN model. We use the RNN-SPN to classify digits in cluttered MNIST sequences. The proposed model achieves a single digit error of 1.5% compared to 2.9% for a convolutional networks and 2.0% for convolutional networks with SPN layers. The SPN outputs a zoomed, rotated and skewed version of the input image. We investigate different down-sampling factors (ratio of pixel in input and output) for the SPN and show that the RNN-SPN model is able to down-sample the input images without deteriorating performance. The down-sampling in RNN-SPN can be thought of as adaptive down-sampling that minimizes the information loss in the regions of interest. We attribute the superior performance of the RNN-SPN to the fact that it can attend to a sequence of regions of interest.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes