Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges
This addresses the challenge of robust object recognition in computer vision for applications requiring generalization to unseen transformations, though it is incremental as it builds on existing equivariant network concepts.
The paper tackles the problem of recognizing objects under group-symmetric transformations not seen during training, such as unusual poses or scales, by proposing architectures that learn equivariant operators in a latent space from examples, and demonstrates success in out-of-distribution classification on rotated and translated noisy MNIST datasets.
Despite the successes of deep learning in computer vision, difficulties persist in recognizing objects that have undergone group-symmetric transformations rarely seen during training-for example objects seen in unusual poses, scales, positions, or combinations thereof. Equivariant neural networks are a solution to the problem of generalizing across symmetric transformations, but require knowledge of transformations a priori. An alternative family of architectures proposes to earn equivariant operators in a latent space from examples of symmetric transformations. Here, using simple datasets of rotated and translated noisy MNIST, we illustrate how such architectures can successfully be harnessed for out-of-distribution classification, thus overcoming the limitations of both traditional and equivariant networks. While conceptually enticing, we discuss challenges ahead on the path of scaling these architectures to more complex datasets.