Self-Masking Networks for Unsupervised Adaptation
This work addresses the challenge of efficient fine-tuning for computer vision models when labeled data is scarce, offering a practical solution for domain-specific applications.
The paper tackles the problem of adapting large pretrained models to downstream tasks with limited labeled data by proposing self-supervised masking networks (SMNs) that learn binary masks, resulting in up to 79x storage efficiency and significant performance improvements in label-efficient settings.
With the advent of billion-parameter foundation models, efficient fine-tuning has become increasingly important for the adaptation of models to downstream tasks. However, especially in computer vision, it can be hard to achieve good performance when access to quality labeled data is lacking. In this work, we propose a method adapting pretrained generalist models in a self-supervised manner by learning binary masks. These self-supervised masking networks (SMNs) are up to 79x more efficient to store and significantly improve performance on label-efficient downstream tasks. We validate the usefulness of learning binary masks as a fine-tuning method on 8 datasets and 3 model architectures, and we demonstrate the effectiveness of SMNs in 3 label-efficient settings.