Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories
This work addresses the scalability and cost issues in attribute-based models for computer vision, making them more applicable to large datasets.
The paper tackles the problem of manually defining attributes and associations in attribute-based recognition by proposing an unsupervised approach that automatically discovers a vocabulary from online text and learns class-attribute associations using a deep convolutional model with a linguistic prior. It demonstrates improved zero-shot learning performance on datasets like ImageNet, Animals with Attributes, and aPascal/aYahoo, and enables large-scale attribute-based learning on ImageNet.
Attribute-based recognition models, due to their impressive performance and their ability to generalize well on novel categories, have been widely adopted for many computer vision applications. However, usually both the attribute vocabulary and the class-attribute associations have to be provided manually by domain experts or large number of annotators. This is very costly and not necessarily optimal regarding recognition performance, and most importantly, it limits the applicability of attribute-based models to large scale data sets. To tackle this problem, we propose an end-to-end unsupervised attribute learning approach. We utilize online text corpora to automatically discover a salient and discriminative vocabulary that correlates well with the human concept of semantic attributes. Moreover, we propose a deep convolutional model to optimize class-attribute associations with a linguistic prior that accounts for noise and missing data in text. In a thorough evaluation on ImageNet, we demonstrate that our model is able to efficiently discover and learn semantic attributes at a large scale. Furthermore, we demonstrate that our model outperforms the state-of-the-art in zero-shot learning on three data sets: ImageNet, Animals with Attributes and aPascal/aYahoo. Finally, we enable attribute-based learning on ImageNet and will share the attributes and associations for future research.