Self-supervised visual feature learning with curriculum
This work addresses a bottleneck in self-supervised visual feature learning for computer vision applications, but it is incremental as it builds on existing curriculum learning ideas.
The paper tackles the problem of self-supervised learning where pretext tasks can be trivialized by low-level signals, impacting downstream task convergence speed. They propose using curriculum learning to progressively remove these signals, resulting in a significant increase in convergence speed.
Self-supervised learning techniques have shown their abilities to learn meaningful feature representation. This is made possible by training a model on pretext tasks that only requires to find correlations between inputs or parts of inputs. However, such pretext tasks need to be carefully hand selected to avoid low level signals that could make those pretext tasks trivial. Moreover, removing those shortcuts often leads to the loss of some semantically valuable information. We show that it directly impacts the speed of learning of the downstream task. In this paper we took inspiration from curriculum learning to progressively remove low level signals and show that it significantly increase the speed of convergence of the downstream task.