An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild
This work addresses a practical limitation in zero-shot learning for real-world object recognition, though it is incremental as it builds on existing methods with a calibration approach.
The paper tackles the problem of generalized zero-shot learning (GZSL) for object recognition, showing that existing zero-shot learning methods perform poorly when test data includes both seen and unseen classes, and proposes a calibration method to balance recognition between these classes, with analysis revealing a large performance gap compared to an upper bound.
Zero-shot learning (ZSL) methods have been studied in the unrealistic setting where test data are assumed to come from unseen classes only. In this paper, we advocate studying the problem of generalized zero-shot learning (GZSL) where the test data's class memberships are unconstrained. We show empirically that naively using the classifiers constructed by ZSL approaches does not perform well in the generalized setting. Motivated by this, we propose a simple but effective calibration method that can be used to balance two conflicting forces: recognizing data from seen classes versus those from unseen ones. We develop a performance metric to characterize such a trade-off and examine the utility of this metric in evaluating various ZSL approaches. Our analysis further shows that there is a large gap between the performance of existing approaches and an upper bound established via idealized semantic embeddings, suggesting that improving class semantic embeddings is vital to GZSL.