Topographic VAEs learn Equivariant Capsules
This work addresses the challenge of unsupervised learning of equivariant representations for neural networks, offering a novel approach that expands upon existing group equivariant methods, though it is incremental in nature.
The paper tackled the problem of bridging topographic organization and equivariance in neural networks by introducing the Topographic VAE, which learns sets of approximately equivariant features (capsules) from sequences, achieving higher likelihood on transforming test sequences and verifying equivariance through quantitative commutativity measurements.
In this work we seek to bridge the concepts of topographic organization and equivariance in neural networks. To accomplish this, we introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables. We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST. Furthermore, through topographic organization over time (i.e. temporal coherence), we demonstrate how predefined latent space transformation operators can be encouraged for observed transformed input sequences -- a primitive form of unsupervised learned equivariance. We demonstrate that this model successfully learns sets of approximately equivariant features (i.e. "capsules") directly from sequences and achieves higher likelihood on correspondingly transforming test sequences. Equivariance is verified quantitatively by measuring the approximate commutativity of the inference network and the sequence transformations. Finally, we demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.