Zero-shot Learning with Deep Neural Networks for Object Recognition
This review paper addresses the problem of recognizing objects without visual training data for researchers and practitioners in computer vision and machine learning, providing an overview of existing methods and challenges.
This paper reviews deep neural network approaches for zero-shot learning (ZSL), where objects are recognized without visual training samples by mapping visual data to semantic prototypes. It highlights key findings that shaped the field and outlines current challenges.
Zero-shot learning deals with the ability to recognize objects without any visual training sample. To counterbalance this lack of visual data, each class to recognize is associated with a semantic prototype that reflects the essential features of the object. The general approach is to learn a mapping from visual data to semantic prototypes, then use it at inference to classify visual samples from the class prototypes only. Different settings of this general configuration can be considered depending on the use case of interest, in particular whether one only wants to classify objects that have not been employed to learn the mapping or whether one can use unlabelled visual examples to learn the mapping. This chapter presents a review of the approaches based on deep neural networks to tackle the ZSL problem. We highlight findings that had a large impact on the evolution of this domain and list its current challenges.