LGCVJan 22, 2023

Unifying Synergies between Self-supervised Learning and Dynamic Computation

arXiv:2301.09164v3h-index: 66
Originality Highly original
AI Analysis

This work addresses the problem of high computational costs in SSL for industrial applications, offering a more efficient training strategy without fine-tuning.

The paper tackles the computational inefficiency of self-supervised learning (SSL) in resource-constrained settings by proposing a method to simultaneously learn dense and gated sub-networks from scratch, achieving on-par performance with vanilla SSL while significantly reducing FLOPs across benchmarks like CIFAR-10/100, STL-10, and ImageNet-100.

Computationally expensive training strategies make self-supervised learning (SSL) impractical for resource constrained industrial settings. Techniques like knowledge distillation (KD), dynamic computation (DC), and pruning are often used to obtain a lightweightmodel, which usually involves multiple epochs of fine-tuning (or distilling steps) of a large pre-trained model, making it more computationally challenging. In this work we present a novel perspective on the interplay between SSL and DC paradigms. In particular, we show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting without any additional fine-tuning or pruning steps. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off and therefore yields a generic and multi-purpose architecture for application specific industrial settings. Extensive experiments on several image classification benchmarks including CIFAR-10/100, STL-10 and ImageNet-100, demonstrate that the proposed training strategy provides a dense and corresponding gated sub-network that achieves on-par performance compared with the vanilla self-supervised setting, but at a significant reduction in computation in terms of FLOPs, under a range of target budgets (td ).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes