CVLGMLJun 19, 2019

Learning Generalized Transformation Equivariant Representations via Autoencoding Transformations

arXiv:1906.08628v348 citations
AI Analysis

This work addresses the challenge of improving representation learning in computer vision by generalizing transformation equivariance, which is incremental as it builds on existing equivariance concepts.

The paper tackles the problem of learning visual representations that are equivariant to various transformations beyond translation, by proposing AutoEncoding Transformations (AET) and AutoEncoding Variational Transformations (AVT) models to capture complex patterns of visual structures. The result is that these models outperform state-of-the-art models in unsupervised and (semi-)supervised tasks, though no concrete numbers are provided in the abstract.

Transformation Equivariant Representations (TERs) aim to capture the intrinsic visual structures that equivary to various transformations by expanding the notion of {\em translation} equivariance underlying the success of Convolutional Neural Networks (CNNs). For this purpose, we present both deterministic AutoEncoding Transformations (AET) and probabilistic AutoEncoding Variational Transformations (AVT) models to learn visual representations from generic groups of transformations. While the AET is trained by directly decoding the transformations from the learned representations, the AVT is trained by maximizing the joint mutual information between the learned representation and transformations. This results in Generalized TERs (GTERs) equivariant against transformations in a more general fashion by capturing complex patterns of visual structures beyond the conventional linear equivariance under a transformation group. The presented approach can be extended to (semi-)supervised models by jointly maximizing the mutual information of the learned representation with both labels and transformations. Experiments demonstrate the proposed models outperform the state-of-the-art models in both unsupervised and (semi-)supervised tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes