CVLGJan 16, 2013

Zero-Shot Learning Through Cross-Modal Transfer

arXiv:1301.3666v21527 citations
AI Analysis

This addresses the challenge of zero-shot learning for computer vision, enabling object recognition without manual semantic features, though it is incremental by building on prior zero-shot models.

The paper tackles the problem of recognizing objects in images without training data for those objects, using unsupervised text corpora, achieving state-of-the-art performance on seen classes and reasonable results on unseen classes.

This work introduces a model that can recognize objects in images even if no training data is available for the objects. The only necessary knowledge about the unseen categories comes from unsupervised large text corpora. In our zero-shot framework distributional information in language can be seen as spanning a semantic basis for understanding what objects look like. Most previous zero-shot learning models can only differentiate between unseen classes. In contrast, our model can both obtain state of the art performance on classes that have thousands of training images and obtain reasonable performance on unseen classes. This is achieved by first using outlier detection in the semantic space and then two separate recognition models. Furthermore, our model does not require any manually defined semantic features for either words or images.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes