CVSep 14, 2016

Warped Convolutions: Efficient Invariance to Spatial Transformations

arXiv:1609.04382v5116 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient spatial invariance in computer vision, offering a practical solution for tasks like pose estimation, though it builds on prior generalized convolution work.

The paper tackles the problem of achieving efficient invariance to spatial transformations beyond translation in CNaps, presenting a method that uses a constant image warp followed by convolution to match the computational complexity of standard convolutions. It shows results on vehicle pose estimation in Google Earth (rotation and scale) and face pose estimation in AFLW (3D rotations under perspective).

Convolutional Neural Networks (CNNs) are extremely efficient, since they exploit the inherent translation-invariance of natural images. However, translation is just one of a myriad of useful spatial transformations. Can the same efficiency be attained when considering other spatial invariances? Such generalized convolutions have been considered in the past, but at a high computational cost. We present a construction that is simple and exact, yet has the same computational complexity that standard convolutions enjoy. It consists of a constant image warp followed by a simple convolution, which are standard blocks in deep learning toolboxes. With a carefully crafted warp, the resulting architecture can be made equivariant to a wide range of two-parameter spatial transformations. We show encouraging results in realistic scenarios, including the estimation of vehicle poses in the Google Earth dataset (rotation and scale), and face poses in Annotated Facial Landmarks in the Wild (3D rotations under perspective).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes