Associative embeddings for large-scale knowledge transfer with self-assessment
This addresses the need for large-scale annotated datasets in computer vision, though it is incremental as it builds on existing knowledge transfer methods.
The paper tackles the problem of automatically populating ImageNet with bounding-box and segmentation annotations by transferring knowledge from annotated to unannotated classes, achieving high localization accuracy (73% average overlap) for 30% of images.
We propose a method for knowledge transfer between semantically related classes in ImageNet. By transferring knowledge from the images that have bounding-box annotations to the others, our method is capable of automatically populating ImageNet with many more bounding-boxes and even pixel-level segmentations. The underlying assumption that objects from semantically related classes look alike is formalized in our novel Associative Embedding (AE) representation. AE recovers the latent low-dimensional space of appearance variations among image windows. The dimensions of AE space tend to correspond to aspects of window appearance (e.g. side view, close up, background). We model the overlap of a window with an object using Gaussian Processes (GP) regression, which spreads annotation smoothly through AE space. The probabilistic nature of GP allows our method to perform self-assessment, i.e. assigning a quality estimate to its own output. It enables trading off the amount of returned annotations for their quality. A large scale experiment on 219 classes and 0.5 million images demonstrates that our method outperforms state-of-the-art methods and baselines for both object localization and segmentation. Using self-assessment we can automatically return bounding-box annotations for 30% of all images with high localization accuracy (i.e.~73% average overlap with ground-truth).