Unsupervised learning of object landmarks by factorized spatial embeddings
This addresses the challenge of characterizing object structure without supervision for computer vision applications, though it appears incremental as it builds on existing unsupervised learning methods.
The paper tackles the problem of automatically learning object structure by proposing an unsupervised approach that discovers and learns landmarks in object categories, achieving high accuracy in predicting manually-annotated landmarks on face benchmark datasets.
Learning automatically the structure of object categories remains an important open problem in computer vision. In this paper, we propose a novel unsupervised approach that can discover and learn landmarks in object categories, thus characterizing their structure. Our approach is based on factorizing image deformations, as induced by a viewpoint change or an object deformation, by learning a deep neural network that detects landmarks consistently with such visual effects. Furthermore, we show that the learned landmarks establish meaningful correspondences between different object instances in a category without having to impose this requirement explicitly. We assess the method qualitatively on a variety of object types, natural and man-made. We also show that our unsupervised landmarks are highly predictive of manually-annotated landmarks in face benchmark datasets, and can be used to regress these with a high degree of accuracy.