CVMay 26, 2022

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing

arXiv:2205.13267v130 citationsh-index: 57
Originality Highly original
AI Analysis

This addresses the inefficiency of customizing pre-training for different tasks in self-supervised learning, offering a scalable solution for practitioners.

The paper tackles the problem of irrelevant data in self-supervised pre-training degrading downstream task performance by proposing Scalable Dynamic Routing (SDR), which trains multiple sub-nets on data subsets and dynamically routes to select the best for each task, achieving state-of-the-art accuracy over 11 classification tasks and AP on PASCAL VOC detection.

Self-supervised learning (SSL), especially contrastive methods, has raised attraction recently as it learns effective transferable representations without semantic annotations. A common practice for self-supervised pre-training is to use as much data as possible. For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance, observed from our extensive experiments. On the other hand, for existing SSL methods, it is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks. To address this issue, we propose a novel SSL paradigm called Scalable Dynamic Routing (SDR), which can be trained once and deployed efficiently to different downstream tasks with task-customized pre-trained models. Specifically, we construct the SDRnet with various sub-nets and train each sub-net with only one subset of the data by data-aware progressive training. When a downstream task arrives, we route among all the pre-trained sub-nets to get the best along with its corresponding weights. Experiment results show that our SDR can train 256 sub-nets on ImageNet simultaneously, which provides better transfer performance than a unified model trained on the full ImageNet, achieving state-of-the-art (SOTA) averaged accuracy over 11 downstream classification tasks and AP on PASCAL VOC detection task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes