Using Sentences as Semantic Representations in Large Scale Zero-Shot Learning
This addresses the challenge of poor scalability in zero-shot learning for large-scale applications, though it is incremental as it builds on existing multimodal approaches.
The paper tackled the problem of scaling zero-shot learning to large datasets by using short natural language sentences as class descriptions, and found that combining sentences with word embeddings significantly outperformed existing state-of-the-art methods.
Zero-shot learning aims to recognize instances of unseen classes, for which no visual instance is available during training, by learning multimodal relations between samples from seen classes and corresponding class semantic representations. These class representations usually consist of either attributes, which do not scale well to large datasets, or word embeddings, which lead to poorer performance. A good trade-off could be to employ short sentences in natural language as class descriptions. We explore different solutions to use such short descriptions in a ZSL setting and show that while simple methods cannot achieve very good results with sentences alone, a combination of usual word embeddings and sentences can significantly outperform current state-of-the-art.