Pretrained equivariant features improve unsupervised landmark discovery
This work improves unsupervised landmark detection for computer vision applications, offering a solution to a core limitation in existing methods, though it is incremental in nature.
The paper tackles the problem of unsupervised landmark discovery by addressing the inability of existing methods to produce equivariant intermediate features, leading to a two-step approach that first learns pixel-based features and then applies traditional equivariance methods, achieving state-of-the-art results on datasets like BBC Pose and Cat-Head with comparable performance on other benchmarks.
Locating semantically meaningful landmark points is a crucial component of a large number of computer vision pipelines. Because of the small number of available datasets with ground truth landmark annotations, it is important to design robust unsupervised and semi-supervised methods for landmark detection. Many of the recent unsupervised learning methods rely on the equivariance properties of landmarks to synthetic image deformations. Our work focuses on such widely used methods and sheds light on its core problem, its inability to produce equivariant intermediate convolutional features. This finding leads us to formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features and then use the pre-trained features to learn a landmark detector by the traditional equivariance method. Our method produces state-of-the-art results in several challenging landmark detection datasets such as the BBC Pose dataset and the Cat-Head dataset. It performs comparably on a range of other benchmarks.