Learning Finer-class Networks for Universal Representations
This addresses the challenge of data scarcity in real-world visual recognition for practitioners needing adaptable models, though it is incremental as it builds on existing diversification approaches.
The paper tackled the problem of learning universal representations for visual recognition when annotated data is scarce, by proposing a method that exploits finer-classes without annotation through unsupervised learning and a bottom-up split and merge strategy, resulting in significantly better performance on 10 target-tasks across multiple domains compared to state-of-the-art methods.
Many real-world visual recognition use-cases can not directly benefit from state-of-the-art CNN-based approaches because of the lack of many annotated data. The usual approach to deal with this is to transfer a representation pre-learned on a large annotated source-task onto a target-task of interest. This raises the question of how well the original representation is "universal", that is to say directly adapted to many different target-tasks. To improve such universality, the state-of-the-art consists in training networks on a diversified source problem, that is modified either by adding generic or specific categories to the initial set of categories. In this vein, we proposed a method that exploits finer-classes than the most specific ones existing, for which no annotation is available. We rely on unsupervised learning and a bottom-up split and merge strategy. We show that our method learns more universal representations than state-of-the-art, leading to significantly better results on 10 target-tasks from multiple domains, using several network architectures, either alone or combined with networks learned at a coarser semantic level.