Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing
This addresses the inefficiency of customizing pre-training for different tasks in self-supervised learning, offering a scalable solution for practitioners.
The paper tackles the problem of irrelevant data in self-supervised pre-training degrading downstream task performance by proposing Scalable Dynamic Routing (SDR), which trains multiple sub-nets on data subsets and dynamically routes to select the best for each task, achieving state-of-the-art accuracy over 11 classification tasks and AP on PASCAL VOC detection.
Self-supervised learning (SSL), especially contrastive methods, has raised attraction recently as it learns effective transferable representations without semantic annotations. A common practice for self-supervised pre-training is to use as much data as possible. For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance, observed from our extensive experiments. On the other hand, for existing SSL methods, it is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks. To address this issue, we propose a novel SSL paradigm called Scalable Dynamic Routing (SDR), which can be trained once and deployed efficiently to different downstream tasks with task-customized pre-trained models. Specifically, we construct the SDRnet with various sub-nets and train each sub-net with only one subset of the data by data-aware progressive training. When a downstream task arrives, we route among all the pre-trained sub-nets to get the best along with its corresponding weights. Experiment results show that our SDR can train 256 sub-nets on ImageNet simultaneously, which provides better transfer performance than a unified model trained on the full ImageNet, achieving state-of-the-art (SOTA) averaged accuracy over 11 downstream classification tasks and AP on PASCAL VOC detection task.